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A New Readability Yardstick * 


Rudolf Flesch 
Dobbs Ferry, N. Y. 


In 1943 the writer developed a statistical formula for the objective 
measurement of readability (comprehension difficulty) (5, 6). The for- 
mula was based on a count of three language elements: average sentence 
length in words, number of affixes, and number of references to people. 
Since its publication, the formula has been put to use in a wide variety of 
fields. For example, it has been applied to newspaper reports (9, 20), 
advertising copy (1), government publications (19), bulletins and leaflets 
for farmers (3), materials for adult education (4), and children’s books 
(12). Its validity has been reaffirmed by five independent studies: the 
formula ratings of psychology textbooks substantially agreed with ratings 
by students and teachers (17); the formula scores rated specially edited 
radio news, newsmagazine, and Sunday news-summary copy “more read- 
able” than comparable newspaper reports (18); advertisements, rated 
“more readable” by the formula, showed higher readership figures (7) ; and 
articles that were simplified with the aid of the formula brought increased 
readership in two successive split-run tests (13, 14). Since 1943, a num- 
ber of academic institutions have incorporated the formuia in the curricu- 
lum of courses in composition, creative writing, journalism, and adver- 
tising; it has also been used as the basis of several graduate research 
projects. 

Because of this wide application, it seemed worthwhile to re-examine 
the formula and to analyze its shortcomings. One of these is to be traced 
to the basic structure of the formula; others are the results of difficulties 
in its application. 

The structural shortcoming of the formula is the fact that it does not 
always show the high readability of direct, conversational writing. For 


*Samples from the main body of this paper, when tested for readability by the 
method here proposed, had an average “reading ease” score of 30 and a “human interest’’ 
score of 0. Presumably, the paper is easier to read than most other articles appearing 
in scientific journals. The section, “The Formulas Restated,” which contains directions 
for users of the formulas, has a “reading ease” score of 79 and a “human interest’’ 
score of 42—which puts that portion of the article in the class of a good cookbook. 
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example, in the study of psychology texts mentioned above (17), the score 
of Koffka’s Principles of gestalt psychology (‘the students’ choice for un- 
readability”) was 5.4 (“difficult”); yet William James’ Principles of 
psychology, a classic example of readability, rated 6.0 (bordering on “very 
difficult”). Similarly, the formula consistently rates the popular Reader’s 
Digest more readable than the sophisticated New Yorker magazine, al- 
though many educated readers consider the Reader’s Digest dull and the 
sprightly New Yorker ten times as readable. 

Aside from that, the practical application of the formula led to several 
minor misinterpretations. Sentence length, for instance, is the element 
with the heaviest weight; it is also the easiest to measure. As a result, 
this feature of the formula is often overemphasized, sometimes to the 
exclusion of the others—as in the directives that have been issued to staff 
writers of the Associated Press and the New York Times, recommending 
the use of shorter sentences in “leads.’’ On the other hand, the second 
element—number of affixes—seems oiten difficult to apply; users of the 
formula found this count particularly tedious and admitted to uncer- 
tainty in spotting affixes. The third element—references to people— 
raised no such questions; but it was sometimes felt to be arbitrary and the 
underlying principle was often misunderstood. 

In addition, many people found it hard to get used to the scoring 
system, which generally ranges from 0 (“‘very easy’’) to 7 (‘‘very dif- 
ficult”). Also, the average time needed to test a 100-word sample is six 
minutes (4). This makes the application of the formula considerably 
faster than that of earlier formulas, which required reference to word 
lists (e.g. Gray-Leary (8) or Lorge (10)), but it is still too long for prac- 
tical use. 

The revision of the formula presented in this paper is an attempt to 
overcome these shortcomings and make the formula a more useful in- 
strument. 

Procedure 


The criterion used in the original formula was McCall-Crabbs’ Stand- 
ard test lessons in reading (11). The formula was so constructed that it 
predicted the average grade level of a child who could answer correctly 
three-quarters of the test questions asked about a given passage. Its 
multiple correlation coefficient was R = .74. It was partly based on 
statistical findings established in an earlier study by Lorge (19). 

For. many obvious reasons, the grade level of children answering test 
questions is not the best criterion for general readability. Data about 
the ease and interests with which adults will read selected passages would 
be far better. But such data were not available at the time the first 
formula was developed, and they are still unavailable today. So McCall- 
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Crabbs’ Standard test lessons are still the best and most extensive criterion 
that can be found; therefore they were used again for the revision. 
In reanalyzing the test passages, the following elements were used: 


(1) Average Sentence Length in Words. The same element was used 
in the previous formula, but the correlation coefficient used was taken 
from Lorge’s earlier findings. In the present study this coefficient was 
recomputed. 

(2) Average word length in syllables, expressed as the number of syl- 
lables per 100 words. The hypothesis was that this measure would 
furnish results similar to the affix count in the earlier formula. Syllables 
are obviously easier to count than affixes since this work can be reduced 
to a mechanical routine. 

(3) Average Percentage of ‘Personal Words.’”’ The same element was 
used in the earlier formula. However, the opportunity was used to test 
a clarified definition, which made no significant difference in correlation. 
The new definition was stated as follows: All nouns with natural gender; 
all pronouns except neuter pronouns; and the words people (used with the 
plural verb) and folks. 

(4) Average Percentage of “Personal Sentences.’ This new element 
was designed to correct the structural shortcoming of the earlier formula, 
mentioned above. By hypothesis, it tests the conversational quality and 
the story interest of the passage analyzed. It was defined as the per- 
centage of the following sentences: Spoken sentences, marked by quota- 
tion marks or otherwise; questions, commands, requests, and other 
sentences directly addressed to the reader; exclamations; and grammati- 
cally incomplete sentences whose meaning has to be inferred from the 
context. 


To make the prediction more accurate, 13 of the 376 McCall-Crabbs’ 
passages that contained poetry or problems in arithmetic were omitted 
in the count of the first two elements, which are designed to test solely 
prose comprehension. However, these 13 passages were retained in the 
count of the last two elements, which are designed to test human interest. 

Following the procedure in the earlier study, intercorrelations were 
then computed. However, multiple correlation of the four elements 
with the criterion showed no significant gain in prediction value over 
the earlier formula in spite of the significant prediction value of the addi- 
tional fourth element by itself (r = — .27). Therefore, two multiple- 
correlation regression formulas were computed: one using the first two 
elements and one using the last two. This procedure had the advantage 
of giving independent predictions of the reading ease and the human in- 
terest of a given passage. 
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Finally, the resulting twin formulas were expressed in such a way that 
maximum readability (in both formulas) had a value of 100, and minimum 
readability a value of 0. This was done to make the scores more readily 
understandable for the practical user. 


Findings 


The intercorrelations, means, standard deviations, and regression 
weights found are shown in Tables 1, 2, and 3. The following symbols 
were used: wi for word length (syllables per 100 words), si for sentence 











Table 1 
Correlations, Means, Standard Deviations, and Regression Weights 
of Word and Sentence Length 
sl Cw xX 8 3 
wl 4644 .6648 134.2208 13.6845 5422 
sl —_ .5157* 16.5213 5.5509 -2639 





* After the preparation of this paper two articles appeared that pointed out a com- 
putational error affecting the writer’s original formula (Dale, E. and Chall, Jeanne 8. 
A formula for predicting readability. Educ. Res. Bull., Ohio St. Univ., 1948, 27, 11-20, 
28; Lorge, I. The Lorge and Flesch readability formulae: a correction. Sch. & Soc., 
1948, 67, 141-142). The error concerned the correlation coefficient between sentence 
length and the criterion, which had originally been reported by Lorge as .6174; the 
writer, acknowledging his debt to Lorge, used that figure without recomputation. The 
corrected correlation coefficient is now reported as .4681 by Dale and Chall, and as 
.467 by Lorge; this corresponds closely to the figure of .5157 reported in Table 1, con- 
sidering the fact that the writer now used a slightly better criterion of 363 passages for 
sentence length. In other words, the formula presented in this paper incidentally 
and independently also corrects the error found by Dale and Chall and by Lorge. 











Table 2 
Correlations, Means, Standard Deviations, and Regression Weights 
of Personal Words and Sentences 
ps Cwo az 8 B 
pw .2268 — .3881 7.3457 5.5175 — 3446 
ps —_ — .2699 29.5745 35.5822 —.1917 





length in words, pw for percentage of “‘personal words,” ps for percentage 
of “personal sentences,”’ Cso for the average grade of children who could 
answer one-half of the test questions correctly, and Cz for the average 


grade of children who could answer three-quarters of the test questions 
correctly. 
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Table 3 
Means and Standard Deviations of Two Criteria 
xX 8 
Cw 5.4973 1.3877 
Cr 7.3484 2.1345 





The two regression formulas based on these correlations are: 

Formula A (for predicting ‘‘reading ease”): RE = 206.835 — .846 wl 
— 1.015 sl. 

The scores computed by this formula have a range from 0 to 100 for 
almost all samples taken from ordinary prose. A score of 100 corresponds 
to the prediction that a child who has completed fourth grade will be 
able to answer correctly three-quarters of the test questions to be asked 
about the passage that is being rated; in other words, a score of 100 in- 
dicates reading matter that is understandable for persons who have com- 
pleted fourth grade and are, in the language of the U. 8S. Census, barely 
“functionally literate.” The range of 100 points was arrived at by 
multiplying the grade level prediction by 10, so that a point on the formula 
scale corresponds to one-tenth of a grade. However, this relationship 
holds true only up to about seventh grade; beyond that, the formula 
under-rates grade level to an increasing degree. Finally, the formula— 
which predicted grade level and, therefore, difficulty—was “turned 
around” by reversing the signs to predict “‘reading ease.” (Before this 
transformation, the formula read: C7, = .0846 wl + .1015 sl — 5.6835.) 
The multiple correlation coefficient of this formula is R = .7047. 

Formula B (for predicting “human interest”): HI = 3.635 pw 
+ .314 ps. 

Scores computed by this formula, too, have a range from 0 to 100. 
A score of 100 has the same meaning as in Formula A. It indicates read- 
ing matter with enough human interest to suit the reading skills and 
habits of a barely “functionally literate’ person. A score of 0, however, 
means here simply that the passage contains neither “personal words’ 
nor “personal sentences’; in contrast to Formula A, the two elements 
counted here may be totally absent. Since the zero point could be fixed 
in this way, the scoring was arrived at by dividing the range between 0 
(absence of both elements) and 100 (prediction of completed fourth grade) 
by 100. The formula therefore contains no statistical constant. The 
signs were reversed in the same fashion as in Formula A. (Before trans- 
formation, this formula read: Cy = — .1333 pw — .0115 ps + 8.6673.) 
The multiple correlation coefficient of this formula is R = .4306. 

Since the correlations of three of the four elements with the criterion 
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Co were higher than those with the criterion C7, the multiple correlation 
with the criterion Cs> was computed first. As a second step, the values 
so found were used to predict criterion C75, since it seemed obviously 
more desirable to predict 75% comprehension than 50% comprehension. 

The correlation between the word length factor (syllable count) and 
the corresponding affix count in the earlier formula was found to be 
r = .87. For practical purposes the two measures may therefore be 
considered equivalent. 

The number of affixes per 100 words (a) can be predicted from the 
syllable count (wl) by the formula: a = .6832 wl — 66.6017. Con- 
versely, the number of syllables per 100 words (wl) can be predicted from 
the number of affixes (a) by the formula: wl = 1.49 a + 94.56. 


Comment 


It is hoped that the two new formulas will prove more useful than the 
earlier formula. Formula A alone, with a correlation coefficient of .70, 
has almost as high a prediction value as the combined earlier formula 
whose correlation coefficient was .74. Formula B has a much lower 
correlation coefficient of .43 and, accordingly, does not seem to contribute 
much to the measurement of readability. It should be remembered, 
however, that because of the criterion used, Formula B predicts only 
the effect of the two “human interest”? elements on comprehension; in 
other words, the correlation coefficient shows only to what extent human 
interest in a given text will make the reader understand it better. The 
real value of this formula, however, lies in the fact that human interest 
will also increase the reader’s attention and his motivation for ccn- 
tinued reading. . 

In addition, the two new formulas will be more useful for the teaching 
of writing, since the added factor and the division into two parts will 
show specific faults in writing more clearly. 

The significance of Formula A will be more easily understood when it 
is realized that the measurement of word length is indirectly a measure- 
ment of word complexity (as mentioned above, the correlation is r = .87) 
and that word complexity in turn is indirectly a measurement of ab- 
straction: the correlation between the number of affixes and that of ab- 
stract words was found to be .78 (5). Similarly, the measurement of 
sentence length is indirectly a measurement of sentence complexity. 
In two independent studies the correlation between these two factors 
was found to be .775 (8) and .72 (15). Sentence complexity, in turn, 
may again be considered as a measure of abstraction. Formula A, there- 
fore, is essentially a test of the level of abstraction. 
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It seems hardly necessary to prove the importance of human interest 
in reading, as tested by Formula B. That people are most interested 
in other people is an old truism. And the readability value of written 
dialogue, as tested by the added element, is well described in the following, 
oddly parallel quotations from a printer and a novelist: ‘Have you ever 
watched people at a library selecting books for home reading? Other 
things being equal, if they see enough pages that . . . promise interest- 
ing dialogue, they are much more apt to put the book under their arm 
and walk away with it, than if they see too many solid pages . . . which 
always suggest hard work” (16). ‘‘What is the use of a book without 
pictures or conversations?’ thought Alice just before the White Rabbit 
ran by, in condemnation of the book her sister was reading, and this 
childish comment is supported by novel-readers of all degrees of in- 
telligence. Long close paragraphs of print are in themselves apt to 
dismay the less serious readers and their instinct here is a sound one, for 
an excess of summary and an insufficiency of scene in a novel make the 
story seem remote, without bite, second-hand. . . . A great part of the 
vigor, the vivacity and the readability of Dickens derives from his in- 
numerable interweavings of scene and summary; his general method is 
to keep summary to the barest essential minimum, a mere sentence or 
two here and there between the incredibly fertile burgeoning of his 
scenes” (2). 

In preliminary tests of the formulas, the following results were found: 

When the newly isolated fourth element (‘“‘personal sentences’) was 


Table 4 


Comparative Analysis of The New Yorker (October 26, 1946) and the 
Reader’s Digest (November, 1946) 











New Yorker Reader’s Digest 

Old Formula: 

Average sentence length in words 20 16 

Affixes per 100 words 36 34 

Personal words per 100 words 10 8 

Readability score 3.59 3.05 
New Formula A: 

Average sentence length in words 20 16 

Syllables per 100 words 148 145 

‘Reading ease” score 61 68 
New Formula B: 

Personal words per 100 words 10 Ss 

Personal sentences per 100 sentences 39 15 


‘Human interest” score 49 34 
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applied to the psychology texts by Koffka and James mentioned above 
(17), it was found that the percentage of “personal sentences’ in Koffka 
was negligible (4%), whereas in James’s first volume it was 16% and in 
his second volume 10%. A striking example of this difference in style 
is the following of James’s “personal sentences”: ‘Ask half the common 
drunkards you know why it is that they fall so often prey to temptation, 
and they will say that most of the time they cannot tell.”” This sentence 
shows well the aspect of readability that eluded the earlier formula. 

When the old and the new formulas were applied to two random copies 
of the New Yorker (October 26, 1946) and the Reader’s Digest (November 
1946), the results were as shown in Table 4. 

As can be seen, the old formula rated the Reader’s Digest significantly 
more readable than the New Yorker; the new formula A also shows that 
the Reader’s Digest is significantly easier to read. But the new formula 
B clearly shows a large difference in human interest in favor of the 
New Yorker. 


The Formulas Restated 


For practical application, the formulas may be restated this way: 
To measure the readability (“reading ease’”’ and “human interest’”’) of 
a piece of writing, go through the following steps: 


Step 1. Unless you want to test a whole piece of writing, take samples. 
Take enough samples to make a fair test (say, three to five of an article 
and 25 to 30 of a book). Don’t try to pick “good” or “typical” samples. 
Go by a strictly numerical scheme. For instance, take every third 
paragraph or every other page. Each sample should start at the be- 
ginning of a paragraph. 

Step 2. Count the words in your piece of writing or, if you are using 
samples, take each sample and count each word in it up to 100. Count 
contractions and hyphenated words as one word. Count as words 
numbers or letters separated by space. 

Step 3. Count the syllables in your 100-word samples or, if you are 
testing a whole piece of writing, compute the number of syllables per 
100 words. If in doubt about syllabication rules, use any good dictionary. 
Count the number of syllables in symbols and figures according to the way 
they are normally read aloud, e.g. two for $ (“dollars’”’) and four for 1918 
(“nineteen-eighteen”). If a passage contains several or lengthy figures, 
your estimate will be more accurate if you don’t include these figures in 
your syllable count. In a 100-word sample, be sure to add instead a 
corresponding number of words in your syllable count. To save time, 
count all syllables except the first in all words of more than one syllable 
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and add the total to the number of words tested. It is also helpful to 
“read silently aloud” while counting. 

Step 4. Figure the average sentence length in words for your piece 
of writing or, if you are using samples, for all your samples combined. 
In a 100-word sample, find the sentence that ends nearest to the 100- 
word mark—that might be at the 94th word or the 109th word. Count 
the sentences up to that point and divide the number of words in those 
sentences by the number of sentences. In counting sentences, follow 
the units of thought rather than the punctuation: usually sentences are 
marked off by periods; but sometimes they are marked off by colons or 
semicolons—like these. But don’t break up sentences that are joined 
by conjunctions like and or but. 

Step 5. Figure the number of “‘personal words’ per 100 words in 
your piece of writing or, if you are using samples, in all your samples com- 
bined. ‘Personal words” are: (a) All first-, second-, and third-person 
pronouns except the neuter pronouns it, its, itself, and they, them, their, 
theirs, themselves if referring to things rather than people. (b) All words 
that have masculine or feminine natural gender, e.g. Jones, Mary, father, 
sister, iceman, actress. Do not count common-gender words like teacher, 
doctor, employee, assistant, spouse. Count singular and plural forms. 
(c) The group words: people (with the plural verb) and folks. 

Step 6. Figure the number of “personal sentences’ per 100 sentences 
in your piece of writing or, if you use samples, in all your samples com- 
bined. ‘Personal sentences” are: (a) Spoken sentences, marked by quo- 
tation marks or otherwise, often including so-called speech tags like ‘‘he 
said”’ (e.g. “I doubt it.”—We told him: “You can take it or leave it.””— 
“That’s all very well,” he replied, showing clearly that he didn’t believe 
a word of what we said). (b) Questions, commands, requests, and other 
sentences directly addressed to the reader. (c) Exclamations. (d) 
Grammatically incomplete sentences whose full meaning has to be in- 
ferred from the context (e.g. Doesn’t know a word of English.—Hand- 
some, though.—Well, he wasn’t.—The minute you walked out). If a 
sentence fits two or more of these definitions, count it only once. Divide 
the number of these “‘personal sentences’”’ by the total number of sen- 
tences you found in Step 4. 

Step 7. Find your “reading ease’’ score by inserting the number of 
syllables per 100 words (word length, wl) and the average sentence length 
(sl) in the following formula: 


R.E. (“reading ease’) = 206.835 — .846 wl — 1.015 sl. 


The “reading ease” score will put your piece of writing on a scale be- 
tween 0 (practically unreadable) and 100 (easy for any literate person). 
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Step 8. Find your “human interest” score by inserting the percentage 


of “‘personal words” (pw) and the percentage of ‘personal sentences” (ps) 
in the following formula: 


H.I. (“human interest’) = 3.635 pw + .314 ps. 


The “human interest’ score will put your piece of writing on a scale 
between 0 (no human interest) and 100 (full of human interest). 

In applying the formulas, remember that Formula A measures length 
(the longer the words and sentences, the harder to read) and Formula 
B measures percentages (the more personal words and sentences, the 
more human interest). 

Roughly, “reading ease” scores will tend to follow the pattern shown 
in Table 5. 

“Human interest”’ scores will follow the general pattern shown in 
Table 6. 

Table 5 
Pattern of “Reading Ease’’ Scores 








“Reading Ease” Description Typical Syllables Average Sentence 
Score of Style Magazine per 100 Words Length in Words 





0 to 30 Very difficult Scientific 192 cr more 29 or more 
30 to 50 Difficult Academic 167 25 
50 to 60 Fairly difficult Quality 155 21 
60 to 70 Standard Digests 147 17 
70 to 80 Fairly easy Slick-fiction 139 14 
80 to 90 Easy -  Pulp-fiction 131 11 
90 to 100 Very easy Comics 123 or less 8 or less 





Table 6 


Pattern of “Human Interest’’ Scores 








Percentage of Percentage of 
Description Typical Personal Personal 
of Style Magazine Words Sentences 





Dull Scientific 2 or less 0 
Mildly interesting Trade . 5 
Interesting Digests 7 15 
Highly interesting New Yorker 11 32 
Dramatic Fiction 17 or more 





Sample Application 


As an example of the application of the new formulas, two recent 
descriptions of the ‘“‘nerve-block” method of anesthesia will be used. 
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By an odd coincidence, these two variations upon a theme appeared within 
the same week in Life (October 27, 1947) and The New Yorker (October 
25, 1947). The Life story served as text accompanying a series of 
pictures; it is straight reporting, not particularly simple, and lacks human 
interest (which was supplied by the pictures). The New Yorker passage 
is part of a personality profile, vivid, dramatic, using all the tricks of the 
trade to get the reader interested and keep him in suspense. 


From Life: 


Except in the field of surgery, control of pain is still very much in the 
primitive stages. Countless thousands of patients suffer the tortures of cancer, 
angina pectoris and other distressing diseases while their physicians are helpless 
to relieve them. A big step toward help for these sufferers is now being made 
with a treatment known as nerve-blocking. This treatment, which consists 
of putting a “‘block”’ between the source of pain and the brain, is not a new 
therapy. But its potentialities are just now being realized. Using better 
drugs and a wider knowledge of the mechanics of pain gained during and since 
the war, Doctors E. A. Rovenstine and E. M. Papper of the New York Univer- 
sity College of Medicine have been able to help two-thirds of the patients 
accepted for treatment in their “pain clinic” at Bellevue Hospital. 

The nerve-block treatment is comparatively simple and does not have 
serious aftereffects. It merely involves the injection of an anesthetic drug 
along the path of the nerve carrying pain impulses from the diseased or injured 
tissue to the brain. Although its action is similar to that of spinal anesthesia 
used in surgery, nerve block generally lasts much longer and is only occasionally 
used for operations. The N. Y. U. doctors have found it effective in a wide 
range of diseases, including angina pectoris, sciatica, shingles, neuralgia and 
some forms of cancer. Relief is not always permanent, but usually the injec- 
tion can be repeated. Some angina pectoris patients have had relief for periods 
ranging from six months to two years. While recognizing that nerve block 
is no panacea, the doctors feel that results obtained in cases like that of Mike 
Ostroich (next page) will mean a much wider application in the near future. 


From The New Yorker: 


. . . Recently, [Rovenstine] devoted a few minutes to relieving a free 
patient in Bellevue of a pain in an arm that had been cut off several years 
before. The victim of this phantom pain said that the tendons ached and 
that his fingers were clenched so hard he could feel his nails digging into his 
palm. Dr. Rovenstine’s assistant, Dr. E. M. Papper, reminded Rovenstine 
that a hundred and fifty years ago the cure would have been to dig up the 
man’s arm, if its burial place was known, and straighten out the hand. Roven- 
stine smiled. “I tell you,” he said. ‘We'll use a two-percent solution of 
procaine, and if it works, in a couple of weeks we’ll go on with an alcohol 
solution. Procaine, you know, lasts a couple of weeks, alcohol six months or 
longer. In most cases of this sort, I use the nerve block originated by Labat 
around 1910 and improved on in New Orleans about ten years back, plus one 
or two improvisations of my own.” (Nerve blocking is a method of anes- 
thetizing a nerve that is transmitting pain.) .. . 

The man with the pain in the nonexistent hand was an indigent, and 
Rovenstine was working before a large gallery of student anesthetists and 
visitors when he exorcised the ghosts that were paining him. Some of the 
spectators, though they felt awed, also felt inclined to giggle. Even trained 
anesthetists sometimes get into this state during nerve-block demonstrations 
because of the tenseness such feats of magic induce in them. The patient, 











232 Rudolf Flesch 


thin, stark-naked, and an obvious product of poverty and cheap gin mills, 
was nervous and rather apologetic when he was brought into the operating 
theatre. He lay face down on the operating table. Rovenstine has an easy 
manner with patients, and as his thick, stubby hands roamed over the man’s 
back, he gently asked, ““How you doing?” ‘My hand, it is all closed together, 
Doc,” the man answered, startled and evidently a little proud of the attention 
he was getting. ‘You'll be O.K. soon,” Rovenstine said, and turned to the 
audience. ‘One of my greatest contributions to medical science has been the 
use of the eyebrow pencil,’ he said. He took one from the pocket of his white 
smock and made a series of marks on the patient’s back, near the shoulder of 
the amputated arm, so that the spectators could see exactly where he was 
going to work. With a syringe and needle, he raised four small weals on the 
man’s back and then shoved long needles into the weals. The man shuddered 
but said he felt no pain. Rovenstine then attached a syringe to the first 
needle, injected the procaine solution, unfastened the syringe, attached it to 
the next needle, injected more of the solution, and so on. The patient’s face 
began to relax a little. ‘‘Lord, Doc,” he said. ‘‘My hand is loosening up a 
ee already.” ‘You'll be all right by tonight, I think,’ Rovenstine said. 
e was. 





A comparative analysis of these two passages is shown in Table 7. 
The two passages furnish a good illustration of the stylistic features 
measured and emphasized by the two new formulas. 




















Table 7 
Comparative Analysis of Treatment of Same Theme in Life and The New Yorker 
Life New Yorker 
(290 words) (495 words) 

Old Formula: 
Average sentence length in words 22 18 
Affixes per 100 words 48 35 
Personal words per 100 words 2 ll 
Readability score 5.16 3.20 

New Formula A: 
Average sentence length in words 22 18 
Syllables per 100 words 165 145 
“Reading ease’’ score 46 66 

New Formula B: 
Personal words per 100 words 2 11 
Personal sentences per 100 sentences 0 41 
“Human interest” score 7 53 
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The Purdue Pegboard: Norms and Studies of 
Reliability and Validity 


Joseph Tiffin and E. J. Asher 
Division of Education and Applied Psychology, Purdue University 


Extensive use of the Purdue Pegboard! in testing industrial applicants, 
veterans, college students, and individuals seeking vocational guidance 
has made available a considerable body of information regarding the re- 
liability and validity of the test, practice effects, and the general nature 
of finger dexterity. This paper summarizes various studies of the relia- 
bility and validity of the Purdue Pegboard dexterity tests and presents 
a revised set of norms for the tests together with a discussion of the im- 
plication of these findings for personnel testing. 

The Purdue Pegboard is a test of manipulative dexterity designed to 
assist in the selection of employees in industrial jobs requiring manipu- 
lative dexterity, such as assembly, packing, operation of certain machines, 
and other routine manual jobs of an exacting nature. It provides sepa- 
rate measurements of the right hand, leit hand, and both hands together, 
and measures dexterity for two types of activity: one involving gross 
movements of hand, fingers, and arms, and the other involving primarily 
what might be called “tip of the finger’ dexterity needed in small as- 
sembly work. 

Construction 


Extensive observation and experiments have shown that people differ 
markedly in their ability to perform the manipulative operations that are 
required on many industrial jobs. Experiments have also shown that 
the basic dexterity of an employee as revealed by a manipulative dex- 
terity test is related to both quantity and quality of work on various jobs 
requiring such dexterity. Numerous dexterity tests, both in the form of 
pegboards and in other forms, have been in use for some time in many 
industrial plants. The Purdue Pegboard was designed to incorporate the 
desirable features of several of these tests into a simple and easily ad- 
ministered performance test. Of particular importance in its construc- 
tion and administration has been the standardization under conditions 
in which a group of applicants or employees can be tested simultaneously. 
An examiner using a battery of ten boards can test approximately 50 

1 Distributed for the Purdue Research Foundation by Science Research Associates, 
228 S. Wabash, Chicago, Ill. 
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employees per hour, thus overcoming the tedious and high cost admini- 
stration that has characterized many other dexterity tests. The present 
form of the Purdue Pegboard has been standardized after extensive ex- 
perimentation in numerous plants which involved the testing of several 
thousand employees in a wide variety of industrial jobs. 





Fie. 1. The Purdue Pegboard. 


The Test Scores. Five separate test scores may be obtained with the 
Purdue Pegboard, namely: Right Hand; Left Hand; Both Hands; Right 
plus Left plus Both Hands (abbreviated R + L + B); and Assembly. 


Administration and Scoring 


The Pegboard is equipped with the pins, collars, and washers located 
in the proper cups. The operator should be seated comfortably at a 
table approximately 30 inches high. The Pegboard should be directly 
in front of the operator with the cups containing the pins and other parts 
at the far end of the board. The extreme right and extreme left hand 
cups should each contain 25 pins. The pocket immediately to the right 
of the center should contain 20 collars, and the pocket. immediately to 
the left of the center should contain 40 washers. 

Right Hand Test. The testee is instructed to pick up one pin at a time 
with the right hand from the right hand-cup and place these pins in the 
right hand row, starting with the top hole. The testee is allowed to put 
in three or four pins for practice before this part of the test is begun. The 
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pins are then removed and the testee is allowed exactly 30 seconds to put 
in as many pins as possible with the right hand, taking the pins from the 
right hand cup one at a time. The right hand score is the total number of 
pins the testee places with the right hand. 

Left Hand Test. The procedure described above is followed for the 
left hand. Practice with the left hand in placing three or four pins should 
precede the administration of this part of the test. The testee is then 
allowed exactly 30 seconds to take pins one at a time from the left hand cup 
and place them in the left hand row of holes starting with the top hole. The 
left hand score is the total number of pins the testee places with the left hand. 
After the right and left hand sequences have been administered, all pins 
are returned by the testee to the right and left cups, respectively. 

Both Hands Test. This sequence tests both hands working together. 
The testee simultaneously takes a pin from the right hand cup with the 
right hand and a pin from the left hand cup with the left hand, and sim- 
ultaneously places both pins in the two rows of holes, starting with the 
pair of holes farthest away from the testee. Practice in placing three or 
four pairs of pins should be allowed before this test sequence is given. 
After this practice and after all pins have been returned to their proper 
cups, the testee should be allowed exactly 30 seconds to place as many 
pairs of pins as possible, using both hands, each hand picking up and placing 
one pin ata time. The both hands score is the number of pairs of pins that 
are placed during the 30 second test period. 

Right plus Left plus Both Hands. This score is obtained by combining 
the test scores obtained from the test sequences described above. The 
score is simply the number of pins placed with the right hand plus the 
number placed with the left hand plus the number of pairs placed with 
both hands. It should be remembered that the number of pairs of pins 
is used in adding the both hands score—not the number of pins placed 
with both hands. 

Assembly. This sequence tests more minute finger dexterity and 
consists of assembling the pins, collars, and washers. To make certain 
that the testee understands just what he is to do, the administrator should 
instruct him as follows: “Pick up one pin from the right hand cup with 
your right hand and while placing it in the top hole in the right hand row 
pick up a washer with your left hand. As soon as the pin has been placed, 
drop the washer over the pin. While the washer is being placed over the 
pin with the left hand, pick up a collar with the right hand. While the 
collar is being dropped over the pin, pick up another washer with the left 
hand and drop it over the collar. This completes the first ‘assembly’ 
consisting of a pin, a washer, a collar, anda washer. As the final washer 
for the first assembly is being placed with the left hand, start the second 
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assembly immediately by picking up another pin with the right hand. 
Place it in the next hole, drop a washer over it with the left hand; then a 
collar with the right hand, and so on, completing another assembly. 
Now, return pins, collars, and washers to their proper cups and get ready 
to start. You will be given one minute to make as many assemblies as 
you can.” 

The most important point to explain in the instruction for this se- 
quence is that both hands should be operating all of the time, one picking 
up a pin, one a washer, one a collar, and so on. If necessary, the testee 
should be allowed to assemble four or five complete pin-washer-collar- 
washer assemblies before this test is begun, in order to make certain that 
he fully understands the “‘alternating” procedure. The testee must keep 
both hands moving at the same time. If he fails to do this, he should be 
given further instruction by the administrator. 

When the testee is familiar with the procedure of making the assem- 
blies, he should be allowed exactly one minute to make as many such assem- 
blies as possible. The score on the assembly test is the number of parts 
assembled during the one minute of testing time. If eight complete 
assemblies are made, the score is therefore 32, since each assembly con- 
sists of four parts. If six complete assemblies are made and the pin and 
first washer of the seventh assembly are properly placed at the end of 
the minute, the score is 24 plus 2 or 26. This method of scoring is simpler 
than using the number of complete assemblies because it eliminates the 
necessity of using quarters of an assembly in determining the score. 


Norms 


Scores on the Purdue Pegboard dexterity tests have been accumulated 
over a period of several years from a number of different sources. Scores 
for one trial on each of the tests are available at the present time on the 
following groups: 


N 
College Men 461 
College Women 392 
Veterans (Men) 1958 
Industrial Applicants (Men) 865 
Industrial Applicants (Women) 4138 


An analysis of the scores from these groups reveals that two or more 
of the groups do not differ significantly in mean score and variability. 
This can be seen in Table 1 in which the means and standard deviations 
for each population on each test are given. There is no appreciable differ- 
ence between the mean scores of college men and the mean scores of vet- 
erans on the various tests. There is no appreciable difference between 
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college women and industrial women on the first four of the tests. A 
sizable difference does exist, however, between the means of the two 
groups on the assembly test. As would be expected from the data in 
Table 1, it was found that the percentile norms for college men and vet- 
erans were almost identical and that those for ccllege women and in- 
dustrial women were the same except for the assembly test. 








Table 1 
Means and Standard Deviations of Various Groups on Purdue Pegboard 
College Industrial College Industrial 
Men Veterans Men Women Women 


N = 481 N = 1958 N = 865 N = 392 N = 4138 





Right Hand 
M 16.43 16.75 15.87 17.76 17.70 
SD 1.80 1.92 2.09 1.98 1.83 
Left Hand 
M 15.91 15.98 15.16 16.48 15.98 
SD 1.77 1.99 1.98 1.66 1.99 
Both Hands 
M 13.33 13.27 12.53 13.93 14.15 
SD 1.50 1.69 1.79 1.55 1.55 
Total 
M 45.67 45.97 43.57 48.16 47.55 
SD 4.02 4.86 4.94 4.17 4.62 
Assembly 
M 37.52 36.72 33.07 39.08 36.68 
SD 5.71 5.84 6.25 5.44 6.76 





In view of these facts a single set of norms was calculated for college 
men and veterans, and similarly a single set of norms was calculated for 
college women and industrial women on all the tests except the assembly 
test. One and three trial percentile norms are given in Table 2 for college 
men and veterans, industrial men, college women and industrial women. 

Complete data from which three trial norms could be calculated for the 
several groups were not available on so extensive a basis as the data for 
one trial. However, data showing the improvement on the second and 
third trials were available on approximately 500 college students. The 
improvement from one to two to three trials on each of the tests (excepting 
the score obtained by adding the right hand, left hand, and both hand 
scores) is shown in Table 3. The improvement in scores shown in this 
table was used in calculating the expected three trial scores for the in- 
dustrial and veteran populations. These extrapolated scores were used 
in deriving the three trial percentile norms shown in Table 2. 
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Table 2 
Percentile Norms for the Purdue Pegboard 
Right Hand 
Men Women 
Veterans and Industrial Applicants 
Industrial Applicants College Students and College Students 
1 Trial 3 Trials 1 Trial 3 Trials 1 Trial 3 Trials 
| Score Gile Score Gile Score Gile Score Gile Score Gile Score Gile 
62 100 66 100 
21 100 60 99 22 100 64 99 22 100 66 100 
20 99 58 95 21 99 62 97 21 99 64 97 
19 98 56 87 20 98 60 96 20 93 62 92 
18 90 54 78 19 95 58 91 19 84 60 85 
17 77 52 65 18 86 56 83 18 68 58 76 
16 57 50 54 17 67 54 71 17 45 56 65 
15 41 48 43 16 46 52 56 16 27 54 48 
14 27 46 32 15 29 50 43 15 12 52 36 
13 13 44* 23 14 14 48 32 14 5 50 23 
12 7 42 15 13 5 46 21 13 1 48 14 
11 3 40 9 12 1 44 13 46 8 
10 1 38 6 42 6 44 4 
36 3 40 3 42 1 
34 1 38 2 
36 1 
Left Hand 
62 100 64 100 
20 =«100 60 100 21 100 60 99 21 100 62 99 
19 99 58 98 20 838699 58 96 20 99 60 97 
18 97 56 97 19 98 56 91 19 96 58 94 
17 90 54 92 18 92 54 83 18 90 56 89 
16 74 52 &4 17 +80 52 7% 17 +75 54 =79 
15 55 50 73 16 61 50 60 16 53 52 65 
14 38 48 61 15 43 48 47 15 29 50 50 
13 32 46 49 14 20 46 34 14 14 48 34 
12 10 44 38 13 9 44 20 13 5 46 22 
11 5 42 26 12 4 42 11 12 2 44 13 
10 3 40 17 11 2 40 7 11 1 42 6 
9 1 38 10 10 1 38 4 40 3 
36 6 36 2 38 2 
34 4 34 1 36 1 
32 3 
30 2 
28 1 
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Table 2 (Cont.) 





























Both Hands 
Men Women 

Veterans and Industrial Applicants 

Industrial Applicants College Students and College Students 

1 Trial 3 Trials 1 Trial 3 Trials 1 Trial 3 Trials 
Score %ile Score %ile Score Gile Score Gile Score Gile Score Gile 
17 100 52 100 17 100 52 100 19 100 56 =100 
16 98 50 98 16 98 50 99 18 99 54 98 
15 96 48 97 15 93 48 96 17 98 52 97 
14 87 46 95 14 79 46 92 16 95 50 96 
13 72 44 90 13 55 44 84 15 84 48 ol 
12 48 42 80 12 30 42 70 14 62 46 82 
11 26 40 69 11 12 40 52 13 37 44 65 
10 13 38 52 10 3 38 35 12 16 42 49 
9 6 36 38 9 1 36 21 ll 5 40 33 
8 2 34 25 34 11 10 2 38 17 
7 1 32 16 32 5 ree | 36 9 
30 9 30 2 34 4 
28 5 29 1 32 2 
26 3 30 1 

24 1 
Right, Left and Both 
55 100 172 100 5Z 100 176 100 59 100 182 100 
54 99 168 99 56 99 172 98 58 99 180 99 
52 96 164 97 54 97 168 97 56 97 176 98 
50 91 160 95 52 95 164 95 54 94 172 96 
48 84 156 91 50 87 160 93 52 85 168 94 
46 73 152 87 48 75 156 87 50 72 164 89 
44 59 148 80 46 55 152 79 48 56 160 81 
42 43 144 73 44 39 148 66 46 39 156 71 
40 28 140 62 42 23 144 54 44 24 152 61 
38 17 136 52 40 12 140 43 42 13 148 50 
36 9 132 43 38 5 136 33 40 5 144 38 
34 5 128 33 36 2 132 23 38 2 140 28 
32 1 124 23 35 1 128 15 37 1 136 19 
120 17 124 9 132 13 
116 11 120 5 128 7 
112 7 116 3 124 4 
108 4 112 1 120 2 
104 2 116 1 
103 1 
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It is significant to note in Table 3 that the improvement on each test 
from trial one to triai two is less than one times the standard error of the 
difference. The improvement from trial two to trial three is still smaller. 
The improvement from trial one to trial three, while larger than between 
trial one and trial two, is not large enough to satisfy the commonly used 
criterion of statistical significance. 


Table 3 


Effect of Practice on the Purdue Pegboard Dexterity Tests 
(N = 434 College Students) 








Difference between Means 


Ist and 2nd and ist and 
Test M 2nd Trials 8rd Trials 8rd Trials 








Right Hand Ist Trial 17.19 
Right Hand 2nd Trial 18.42 1.23 + 1.30 .39+1.30 1.62 + 1.33 
Right Hand 8rd Trial 18.81 


Left Hand lst Trial 16.07 
Left Hand 2nd Trial 16.99 92 + 1.15 372+115 1.29+1.19 
Left Hand 8rd Trial 17.36 
Both Hands Ist Trial 13.68 
Both Hands 2nd Trial 14.24 .56 + 1.10 .33 + 1.11 89 + 1.10 
Both Hands 3rd Trial 14.57 
Assembly 1st Trial 39.64 
Assembly 2nd Trial 42.68 3.04+3.63 2.00+3.61 5.04 + 3.56 
Assembly 8rd Trial 44.68 





Whether one uses the one trial or three trial method of administration 
and the corresponding norms for the Purdue Pegboard dexterity tests 
depends basically upon the situation in which the tests are used, the 
groups to be tested, and the purpose for which the testing is being done. 
These considerations should in turn be viewed in the light of the reliabil- 
ities of the one and three trial scores. For jobs in which the success of 
the testing program depends more upon increasing the average success of 
employees placed than upon individual measurement and counseling, the 
single trial method of administration is often satisfactory. However, 
when the most precise measurement possible of every individual is desired, 
as in vocational guidance, it is recommended that the three trial method 
of administering the tests be followed. 

It can be seen in Table 1 that veterans are significantly superior on all 
of the Pegboard tests to industrial men. This difference in the two 
populations is reflected in the Table of Norms. The fact that norms 

















The Purdue Pegboard 243 


based upon industrial workers are too low for veterans has been pointed 
out by Long and Hill? and by Strange and Sartain.* This could well be 
due to the fact that the norms for veterans are based upon those veterans 
who have voluntarily consulted a Veterans Guidance Center, and vet- 
erans who seek such counsel are not a randomly selected group of vet- 
erans in general. 

In this connection it is well to keep in mind that a table of norms is 
nothing more nor less than the scores of some group or groups on a 
particular test so scaled as to represent a sort of human measuring stick. 
The comparison of an individual against this measuring stick can only 
indicate where in the group the individual stands. If the statement, in 
numerical terms, of an individual’s position in the group does not pro- 
vide any useful information about the person or is not the information 
desired, nothing is to be gained from comparing the individual with the 
group or groups represented in the table of norms, even if no other norms 
are available. In counseling an individual or assessing his qualifications 
for a particular job it is well to keep in mind that the most significant in- 
formation regarding his test performance is a statement of where he stands 
in the group with which he expects or hopes to compete. A high school 
senior who is considering the advisability of industrial work in a parti- 
cular plant would find that a statement of his test performance in relation 
to present employees of the plant on the job under consideration is of 
more value than a statement of his position among high school seniors, 
even though norms for high school seniors are available. This point is 
further illustrated in personnel testing. It happens repeatedly that the 
information needed about an applicant is how he compares with the 
workers on the specific job for which he is an applicant, not with industrial 
workers in general. It is necessary in such cases to obtain separate norms 
for the specific job. 

The fact that norms for male industrial employees and veterans are 
significantly different does not, therefore, mean that veterans’ scores 
should always be interpreted in terms of veterans’ norms. Scores should 
usually be interpreted in terms of norms set up from a population from 
the job or jobs on which the men being tested may be vocationally placed. 


Reliability 


Table 4 summarizes the results of several studies on the reliability 
of the several tests given by means of the Purdue Pegboard. The relia- 


* Louis Long and John Hill. Additional norms for the Purdue Pegboard. Occupa- 
tions, 1947, 26, 160-161. 

* J. R. Strange and A. Q. Sartain. Veterans’ scores on the Purdue Pegboard. J. 
appl. Psychol., 1948, 32, 35-40. 
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bility coefficients for the one trial method of administering and scoring 
the several tests were obtained by correlating test-retest scores on the 
groups indicated. The reliability coefficients for three trial administra- 
tion were obtained by stepping up the one trial reliabilities by means of 
the Spearman-Brown prophecy formula. Stepped up reliabilities on a 
two trial basis are not given in Table 4, because the norms given above 
cover only one trial and three trial methods of administration. Users 
of the Pegboard can readily compute what the reliabilities of the tests on 
a two trial basis would be by stepping up the one trial reliabilities to tests 
of double their present length. 


Table 4 
Reliability Coefficients for the Purdue Pegboard Dexterity Tests 








1 3 
Test Group N Trial Trials*** 





Right Hand College Students (men and women) 434 .63* ; 

Left Hand College Students (men and women) 434 .60* 82 

Both Hands College Students (men and women) 434 .68* 

Right+ Left+Both College Students (men) 175 71° 

Assembly College Students (men and women) 434 .68* 

Assembly Radio Tube Mounter Trainees 233 .76 
(women) 





* Test-retest reliabilities of college students at Purdue University. 
** From L. V. Surgent. The use of aptitude tests in the selection of radio tube 
mounters. Psychol. Monogr. 1947, 61, 1-40. 
*** Three trial reliabilities obtained in each case by ‘‘stepping up” one trial reliability 
by means of the Spearman-Brown prophecy formula. 


For many industrial purposes, the reliabilities of the one trial tests 
are sufficiently high to justify the use of this method of test administra- 
tion, provided, of course, that the tests are found to have significant 
validity for the particular jobs for which they are to be used. 

In an industrial situation where the highest expectancy of a validity 
coefficient is in the vicinity of .50, there is little to be gained by lengthen- 
ing the tests to increase their reliabilities above the values for one trial 
administration given in Table 4. In the usual formula for correction for 
attenuation, if .50 is, by definition, the ‘‘ceiling’”’ of expected validity, an 
improvement from a .60, for example, to a .90 reliability coefficient of 
the test will only increase the obtained validity coefficient from .40 to .47, 
assuming that the reliability of the criterion is 1.00. From the functional 
viewpoint, a test having a validity coefficient of .40 can be made to work 
as satisfactorily as one having a validity coefficient of .47 if it is possible 
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to reduce slightly the selection or placement ratio. When employees 
are being hired for a variety of jobs (as is nearly always the case in an ex- 
pansion of personnel) such a reduction is usually feasible. Therefore, to 
use the one trial method of administration of the test is often as satis- 
factory in an industrial situation as the longer and more reliable three 
trial method. 

| Validity 


Generalizations concerning the validity of any test should be made 
with great caution, and this is particularly true of dexterity tests. As 
Seashore’ has reported, motor skills are quite specific and ordinarily not 
highly correlated with each other. This situation perhaps accounts for 
the fact that a given dexterity test may have a quite satisfactory validity 
for certain manipulative jobs and be unsuitable for other manipulative 
jobs which might seem to be very similar. While the motor skills meas- 
ured by the different Pegboard tests may not be as specific as those re- 
ferred to by Seashore, the intercorrelations shown in Table 5 indicate that 


Table 5 


Intercorrelations of Three Trial Scores on the Purdue Pegboard 
Dexterity Tests (N = 434 College Students) 
Note: Coefficients corrected for attenuation are printed in bold face type. 











Right Left Both 
Hand Hand Hands Assembly 
Right Hand .59 .69 52 
71 81 61 
Left Hand .59 .67 50 
-71 -80 -60 
Both Hands .69 67 58 
81 .80 -67 
Assembly 52 50 58 - 
61 -60 -67 





the Pegboard tests do measure somewhat different skills. This is more 
true of the assembly test than it is of the other three tests as one might 
expect from the nature of the performances required on the assembly test. 
Each of the tests is sufficiently unique to indicate that it is highly de- 
sirable to conduct a study of the validity of the separate tests among 

‘ As described by H. C. Taylor and J. T. Russell. The relationship of validity coeffi- 
cients to the practical effectiveness of tests in selection: Discussion and tables. J. appl. 
Psychol., 1939, 23, 565-578. 


5 R. H. Seashore. Standard motor skills unit. Psychol. Monogr., 1928, 39, 51-66, 
and Individual differences in motor skills. J. gen. Psychol., 1930, 3, 38-66. 
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employees on specific jobs for which the use of each test is contemplated. 
A number of studies of this type have been conducted, and a brief sum- 
mary of the results is given in Table 6. The validity coefficients obtained 


Table ‘6 
Results of Validity Studies with the Purdue Pegboard 








No. of 


Test Trials Job Criterion N 





Right Hand Light machine operation Make-up pay while learning 17 


Left Hand 
Both Hands 
R+L+B 


Light machine operation 
Light machine operation 
Light machine operation 


Make-up pay while learning 


Make-up pay while learning 
Make-up pay while learning 


17 
17 
16 


Assembly 
Right Hand 


Light machine operation 
Light machine operation 


Make-up pay while learning 16 

Earnings after learning 17 
period 

Earnings after learning 17 
period 

Earnings after learning 17 
period 

Earnings after learning 16 
period 

Earnings after learning 16 
period 

Production Index 28 

Production Index 15 


Left Hand Light machine operation 


Both Hands Light machine operation 


R+L+B Light machine operation 
Assembly Light machine operation 


Assembly 
Right Hand 


Textile quilling 
Simple assembly of small 
Assembly 


Simple assembly of small Production Index 15 


Assembly Radio tube mounters 3 or more pooled overall 233 


ratings 





* From Surgent, op. cit. 
in these several studies are given in the last column of Table 6, and furnish 
a representative sample of the validity coefficients that may be expected 


on various manipulative jobs with various criteria of job success. 


Summary 


Extensive Purdue Pegboard norms on several male and female popula- 
tions have been obtained from various industrial users of this test during 
the past few years. The groups on which test scores were available have 
been combined whenever such combination was justified by statistical 
similarity of groups in mean and standard deviation. After the indicated 
combinations were made, separate tables of percentile norms were set up 
for the following groups: 
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Men Women 

Right Hand 1. Industrial applicants 1. Industrial applicants com- 
2. Veterans combined with bined with college students 

college students 
Left Hand 1. Industrial applicants 1. Industrial applicants com- 
2. Veterans combined with bined with college students 

college students 
Both Hands - 1. Industrial applicants 1. Industrial applicants com- 
2. Veterans combined with bined with college students 

college students 
Right + Left + Both 1. Industrial applicants 1. Industrial applicants com- 
2. Veterans combined with bined with college students 


college students 
. Industrial applicants . Industrial applicants 
. Veterans combined with 2. College students 
college students 


Assembly 


we 
—_ 


Norms for the above groups are given for both the one trial and three 
trial method of administration. The three trial norms were extrapolated 
from one trial norms, using data on the effect of practice obtained from 
one trial, two trial, and three trial administration of the Pegboard tests 
to 434 college students. 

Intercorrelations of the several Pegboard tests were computed from 
scores of 434 college men and women. The intercorrelations ranged from 
00 to .69. ' 

A summary of validity studies of the Pegboard for several industrial 
jobs is included. The obtained validity coefficients from 14 studies 
ranged from .07 to .76. The variation of these validity coefficients among 
industrial jobs, all of which were of a manual repetitive type, serves to 
re-emphasize the fact that the validity of the Pegboard should be sepa- 
rately determined for each job for which its use is contemplated. 


Received February 24, 1948. 
Early. publication. 





The Development of Entrance Tests for the 
United States Coast Guard Academy * 


Sidney H. Newman and Joseph M. Bobbitt 
United States Public Health Service, Washington, D. C. 


The psychological program instituted at the United States Coast 
Guard Academy during the late war (1, 2, 3, 4) has continued in full 
force. One major objective of this program has been to improve the 
methods of selecting Cadets for the Academy, and it is now possible to 
indicate the changes which have been effected to date in Cadet selection 
methods. The program furnishes an excellent illustration of the way in 
which psychological research and application need to be coordinated to 
obtain satisfactory results in developing selection methods for an institu- 
tion such as the Academy. It also gives a concrete example of the high 
degree of cooperation that can develop between military officials and 
psychologists, both civilian and military, when they learn the value of 
their respective methods of approach to mutual problems. The manner 
of the operation of the Academy program, including full freedom for 
research and publication of findings, supplies one answer to Tyson’s (8) 
criticisms of military psychology, lending support to the views of Older 
(6) on this point. 

The Coast Guard Academy offers a four year course leading to a 
Bachelor of Science degree in engineering and a commission as ensign. 
In addition to the collegiate academic work at a high level of difficulty, 
the Cadet receives training in professional Coast Guard and Naval sub- 
jects and intensive work in physical education. He is under the super- 
vision of the Academy for eleven months out of the year, and every effort 
is made to determine whether or not the Cadet exhibits the academic, 
personality, and physical qualifications considered necessary for Coast 
Guard officers. Therefore, Cadet selection needs to be much more 
rigorous and comprehensive than does the selection of college students. 
Approximately 150 Cadets are selected each year from an applicant pop- 
ulation which has ranged from 700 to 2200 young men in recent years. 


History 


Cadets are selected only on the basis of annual national competitive 

tests, examinations, and evaluations (conducted each May for July en- 

* The opinions or assertions contained in this paper are those of the authors and 

are not to be construed as official or as reflecting the views of the U. 8. Coast Guard. 
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trants). Before the war, candidates were required to take written essay 
type examinations in high school English (three hours covering grammar, 
composition, and literature) and mathematics (three and a half hours 
covering algebra, plane and solid geometry, and trigonometry) (10). 
These tests were constructed by the respective Academy departments. 
In addition, the examiners (Civil Service and Coast Guard officers) in- 
terviewed each candidate and reported on his general fitness and adapta- 
bility for service as a Coast Guard officer. After the examination papers 
were graded on a percentage basis, an Adaptability Board of three com- 
missioned officers considered the general adaptability (leadership and 
personality characteristics) of those candidates who received not less than 
70 per cent in mathematics and English. The Board assigned marks in 
adaptability based on the examiner’s report, information on previous 
scholastic work, leadership achievements, and athletic and physical char- 
acteristics. The final mark was obtained by averaging the percentage 
grades in mathematics, English, and general adaptability. 

On the basis of research findings and experience, changes in the 
national competitive procedures have taken place gradually; an entirely 
new system was put into effect at the time of the entrance examinations 
conducted in 1947. Research was begun in July, 1943, involving the 
class entering at that time and graduating in June, 1946. In the 1944 
tests, a vocabulary test developed by the personnel of the psychological 
program, an objective English test furnished by the Cooperative Test 
Service, and a 20 item Personal Inventory, selected from the Navy Per- 
sonal Inventory (7), were introduced. The Personal Inventory consti- 
tuted the first attempt to appraise objectively some of the characteristics 
considered in the adaptability grade. The written problems in mathe- 
matics and the English essay were retained. 

The tests given in 1945 (11) contained objective English and mathe- 
matics examinations obtained by special arrangement with the Measure- 
ment and Guidance Project in Engineering Education. The English 
essay, the vocabulary test, and some written problems in mathematics 
were also administered. The Personal Inventory, U. 8. Coast Guard 
Academy form, was expanded to include the 60 items of the Navy Per- 
sonal Inventory and 85 new experimentai items based on Academy re- 
search and clinical experience. A highly difficult test of quantitative and 
verbal aptitudes, very ably constructed by L. L. and T. G. Thurstone on 
the basis of data furnished by Academy psychologists, was introduced in 
the 1945 competition. At this time a rating scale covering eleven charac- 
teristics was devised to aid examiners in reporting the results of inter- 
viewing the candidates. 


1437 West{59th Street, New York City. 
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In the tests and examinations administered in 1946 the mathematics 
and English tests were completely objective, constructed by the College 
Entrance Examination Board (9), which utilized the findings of the 
Academy research program. The expanded Personal Inventory and the 
aptitude tests were continued. An Index of Background, Activities, and 
Preferences, developed at the Academy to aid in the evaluation of adapta- 
bility characteristics, was instituted. During the entire period under 
discussion, 1944—46, the final entrance mark was calculated by averaging 
the percentage grades in English, mathematics, and adaptability. 


New Annual Competitive Tests 





The entrance tests administered in 1947 represented the culmination of 
gradual changes which have resulted in a new system. The Regulations 
Governing Appointments to Cadetships, revised June, 1946 (12), described 
the aims and methods of the competitive examinations. This discussion 
was based on findings and recommendations presented in a progress re- 
port on the Academy psychological program (3) and other relevant in- 
formation. The Regulations stated, “Successful completion of the 
Academy course and success as an officer depends (1) on an adequate 
educational background, (2) on the possession of aptitudes relative to 
both technical and cultural studies, (3) on a sincere interest in the Coast 
Guard as a career, and (4) on relevant personality and physical charac- 
teristics. . . . The complete examination will measure as fairly and 
accurately as possible the extent to which the candidate meets the four 
general qualifications listed above. The tests will be objective in form 
except that candidates may be required to write one or more short English 
essays on specified subjects.” 

The Regulations indicated that tests would be given covering the 
following fields: 


A. Achievement tests. 
1. English (Grammar, Composition, Literature, and Reading 
Comprehension). 
2. Social Studies (American History, American Government or 
Political Science, Economics, and Current Events). 
3. Mathematics (Algebra, Plane Geometry, and Plane Trigo- 
nometry). 
4. Science (Physics, Chemistry, or both). 
B. Aptitude and ability tests. 
1. Quantitative, mathematical ability. 
2. Verbal or linguistic ability. 
3. Ability to visualize spatial relations. 
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4. Mechanical comprehension and ability to deal with mechanical 
problems. 

5. Aptitudes involved in scientific comprehension, study, and 
research. 


In addition to the components of the entrance examinations listed 
above, the Regulations stated that tests of emotional stability, social 
adjustment, interests, background, and personality characteristics can 
be administered to aid the Adaptability Board in the evaluation of 
adaptability. 

The Regulations stipulated that the raw test scores are to be converted 
to standard scores. The Board may then set separate minimum require- 
ments in terms of standard scores on the achievement and aptitude tests, 
and candidates who fall below these levels can be eliminated from further 
consideration. The final mark of each candidate is computed by aver- 
aging the six sub-scores (all converted to standard scores) in accordance 
with the indicated weights: 


Weights 
English 20 
Social Studies 10 
Mathematics 20 
Science 10 
Aptitudes and Abilities 10 
General Adaptability 30 


Candidates are offered appointments in the order of their final marks 
until the vacancies for the year have been filled. A candidate who fails 
to be appointed may compete again in subsequent years without pre- 
judice provided he still meets the age and physical requirements.” 

The entrance examinations administered in May, 1947, were com- 
pletely objective and included the following: mathematics (two hours), 
English (two hours), social studies (one hour), science (one hour), verbal 
aptitude (35 minutes), spatial aptitude (35 minutes), mathematical apti- 
tude (35 minutes), Personal Inventory, Coast Guard Academy form (45 
‘ minutes), and an expanded Index of Background, Activities, and Prefer- 
ences (90 minutes). The College Entrance Examination Board con- 
structed the achievement and aptitude tests, and will continue to do so 
in the future. 


* In order to be qualified to take the competitive tests a candidate must be between 
the ages cf 17 and 22. At the time of the tests, he must have completed satisfactorily 
three and one-half years of high school work with 7 units in required work, including 
three in mathematics, three in English, and one in physics. He must also pass a pre- 
liminary physical examination before being allowed to take the entrance tests. In 
addition to sufficiently high standing on the entrance examinations, high school gradua- 
tion is required for entrance to the Academy. 
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It is not expected that the kinds of tests or the weightings given the 
tests scores will remain static. It is anticipated that the tests and the 
methods of computing the final entrance mark will change in accordance 
with research findings. 


Research 


The changes in selection methods described above have been based on 
a continuing research program without which progress in test develop- 
ment would have been virtually impossible. A central objective of this 
research program has been to aid in the development of both academic 
and non-academic (personality, leadership, professional, athletic) criteria 
of success at the Academy, and to discover the psychological correlates, 
intellectual and non-intellectual, of such criteria. Since June, 1945, the 
College Entrance Examination Board has been an extremely important 
part of the test development and research. A comprehensive series of, 
tests, requiring about 20 hours of testing time, is administered to each new 
entering class during the preliminary summer term. Psychological in- 
terviews are also conducted with the entering classes. The interviewing 
method has been described, and the validity and reliability of the method, 
as used with Reserve officer candidates, have been found to be satis- 
factory (1, 2, 5). Beginning in July, 1943, the testing program at first 
utilized commercial and Navy tests, in addition to those constructed at 


the Academy (1). In the early stages of the program testing time was 
allotted on an informal basis. The experimental testing program has 
now developed to the point where it is formally inserted into the prelimi- 
nary summer term schedule, and it makes use of many tests specially 
constructed by the College Board for the program. 

The most important functions of the experimental testing program are: 


(1) To discover which psychological characteristics, as measured by 
the tests and interviews, are related to academic, non-academic, 
professional, and athletic performance at the Academy. 

(2) To aid in the development of suitable criteria of success at the 
Academy. 

(3) To discover the best methods and items for measuring character- 
istics which have been found to be related to Academy perfor- 
mance, broadly defined. 

(4) To discover relationships between the tests, Academy perfor- 
mance, and subsequent behavior as a Coast Guard officer. 

(5) To pretest items and develop a library of test items for the en- 
trance tests. 

(6) To compare entering classes with each other and with other 
groups. 
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(7) To check the consistency of research findings from year to year, 
and to keep a check on the effects that changes in curriculum and 
methods for dealing with Cadets may have on previous findings. 

(8) To develop scoring keys for tests of personality, interest, and 
background. 


This is not the place to show specifically how research findings have 
given rise to the development of tests and scoring keys, and how testing 
has produced further research activities. It is enough to say that the 
theoretical and practical aspects of the program are closely interrelated. 
It is also not intended to include research findings at this time. Studies 
are in progress of the relationships between test scores and various aca- 
demic and non-academic measures of performance at the Academy. In- 
tercorrelation, including factor analysis, methods are being used to study 
such problems. Test score comparisons and item analyses are being 
carried forward through bi-serial correlation and contrasting group 
studies. Criterion analyses constitute basic and important parts of the 
research program. It is hoped that the research will produce data of both 
theoretical and practical import. Thus far, this has been the case. 


Comment 


It must be appavent that such a program as has been described in 
this paper could not develop without the wholehearted cooperation and 
support of Coast Guard officials, particularly the Academy administration 
and staff. Most important has been the recognition that a program of 
this kind requires continued research. It is absolutely necessary to con- 
struct yearly entrance tests, and well-validated tests made up of pre- 
tested items could not be long forthcoming without such supporting 
research. It should also be pointed out that the research problems pre- 
viously mentioned still need much work in order to further their solution. 

The cooperation that exists between the College Entrance Examina- 
tion Board and the Academy should also be stressed. The psychological 
program cannot progress without proper testing materials, such as those 
now furnished by the College Board. The yearly entrance tests require 
the services of a staff of test construction experts and it is much more 
satisfactory and economical to make use of a well-developed staff than 
to attempt to build one. The construction of the entrance tests is not a 
sufficiently large undertaking to warrant the permanent services at the 
Academy of a staff such as is available at the College Board. The College 
Board also gives invaluable aid to research projects, and furnishes statis- 
tical services when necessary. 
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Summary 


The gradual evolution of the national competitive examinations for 
the United States Coast Guard Academy has been described. The tests 
and examinations have developed from completely subjective to com- 
pletely objective types. Scoring methods have been changed from per- 
centage grades to standard score procedures. The achievement testing 
has been broadened and aptitude tests have been added. Tests have 
been introduced to furnish objective data on those personality and leader- 
ship characteristics which are termed general adaptability. Computation 
of the final mark upon which entrance to the Academy is based has been 
changed considerably. The interrelations between research and applica- 
tion have been stressed. The cooperative effort involving Coast Guard 
officials, Academy psychologists, and the College Entrance Examination 
Board has been emphasized. 


Received October 29, 1947. 
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An Analysis of Grievances and Aggrieved Employees 
in a Machine Shop and Foundry * 


Arthur C. Eckerman 
Division of Education and Applied Psychology, Purdue University 


The person working on the labor relations front in industry knows 
how slowly progress is being made in bringing labor and management 
to a closer understanding of their mutual problems, problems which are 
inevitably reflected in the endless grievances that must be processed. 
Any program which simply proposes a different method of handling griev- 
ances is only dealing with symptoms; the real causes underlying the 
complaints of labor will lie untouched. 

Various discussions concerning the problems of present day labor re- 
lations led to the research represented by this paper. The hypothesis 
was developed that a statistical analysis of grievances might indicate 
significant differences existing between employees having grievances and 
non-aggrieved employees. 

A large Midwestern plant allowed this study to be made of its griev- 
ances and aggrieved employees. The name of the city, state, and com- 
pany is withheld, and the nature of the company’s products is unimpor- 
tant to the research. Two unions had contracts with the plant, a machine 
shop union and a foundry union. 

Grievances at the plant were divided into two classes, oral and written. 
As no record was kept of grievances in the first and second steps it was 
impossible to get an estimation of the number and the nature of these 
grievances. After the grievance had been reduced to writing in the third 
step a complete and accurate file was kept: on it regardless of its disposi- 
tion. Therefore, in this study of the grievances of the plant only those 
grievances were used that had reached the third step with subsequent 
reduction to writing. 

This situation was fortunate from two standpoints. First, disre- 
garding steps one and two probably reduced the size of the study con- 
siderably and simplified it. Secondly, by not using the first two steps of 
the grievance procedure the results are probably more valid from the 
standpoint of being an accurate description of real labor problems in the 

* This article is based on the author’s dissertation of the same title submitted to the 
Faculty of Purdue University in partial fulfillment of the requirements for the degree 


of Doctor of Philosophy, February, 1948. The dissertation was directed by Dr. Joseph 
Tiffin. 
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plant. Grievances in steps one and two may perhaps more correctly be 
described as complaints of employees. Only when these complaints are 
found to be real differences of opinion between the thinking and the pro- 
gram of the union, on one hand, and the thinking and the policies of the 
company, on the other, may they be considered true grievances. At 
this stage they are formalized by writing and are taken out of the hands 
of operating supervision and operating union officials alike to become 
matters of genuine concern of the managements of both the union and the 
company. 

The research was undertaken with no thesis in mind; there was no 
thought of proving any preconceived opinions, either unionwise or com- 
panywise. The only hypothesis was that if significant differences exist 
between aggrieved and non-aggrieved employees, this type of research 
might identify and describe those differences. 

It is hoped that this research will be of some help to American labor 
‘and industry in their ceaseless striving to arrive at a better understanding 
of their mutual problems. 

Procedure 


A survey of the grievance files of the plant revealed a source of data 
complete in detail for each grievance and in chronological order. The 
personnel records of the plant were also in excellent order and quite 
complete. 

A work sheet was made up which contained two sets of data, (a) the 
pertinent facts of nine items of the grievances, and (b) all available in- 
formation concerning the aggrieved, which consisted of 75 items. From 
the very complete grievance and personnel records of the company 1067 
work sheets were filled in which represented 766 separate grievances of 
327 employees. A number of employees, mostly union officials, each 
filed more than one grievance. The average number of grievances per 
employee of the group having grievances, was 2.3. The first grievance 
filed by an employee was designated as an “initial grievance.’”’ Table 1 
shows the distribution of grievances and grievers. 


Table 1 


Number of Grievers and Grievances 








Foundry Machine Shop 


223 104 
150 92 

73 12 
644 122 
223 104 
421 18 
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In the foundry agency 223 initial grievances were found and 104 in the 
machine shop. Work sheets were made up for two control groups con- 
sisting of 201 foundry employees, selected at random from the personnel 
files, who had not filed a grievance and 100 machine shop non-aggrieved 
employees also selected at random from the personnel files. 

Items on the work sheets were coded, and the resulting information 
was punched on I.B.M. cards. Two sets of data were tabulated from the 
punched cards, data on the grievances and data on the grievers and the 
control groups. The foundry data were separated from that of the ma- 
chine shop. Each bargaining agency then had two sets of data, grievance 
information and personnel information. The grievance data had two 
divisions, that of (a) initial grievances and (b) other grievances. The 
personnel data also had two divisions, (a) aggrieved and (b) non-aggrieved 
employees. 

A statistical analysis was made of the results of the tabulation. The 
‘figures were expressed in either per cents or medians. A number of items 
such as vacation, yearly income, and others were calculated only for a 
twelve month period, the calendar year of 1946. All figures are compara- 
ble, having been equated to take care of variables introduced by a general 
wage raise. 

The difference between each of the respective groupings of grievance 
and personnel data was computed. The standard error of each difference 
was also computed. Fisher’s ¢ statistic was computed by dividing the 
difference by the standard error of the difference. An entry from Fisher’s 
table was then obtained for each ¢ value and the probability P determined. 

Each ¢ value is indicative of a level of significance which may be in- 
terpreted as the probability that a difference as large as the obtained 
difference could have occurred if the samples were drawn from the same 
population, or to put it another way, as the probability that the difference 
could have occurred by chance alone. 

If a difference as large as the obtained difference could have occurred 
as frequently as 5 times in 100 among pairs of samples drawn from the 
same population, the difference is considered significant at the 5% level. 
Since chance alone could account for a difference as large as the obtained 
difference only 5 times in 100, the null hypothesis that the true difference 
is zero can be rejected. A difference significant at the 5% level or lower 
is marked on the tables with a double dagger. 

If a difference as iarge as the obtained difference could have occurred 
as frequently as 10 times in 100 among pairs of samples drawn from the 
same population, the difference is considered significant at the 10% level. 
A difference significant at the 10% level is marked on the tables with a 
dagger. If a difference as large as the obtained difference could have 
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occurred as frequently as 20 times in 100 among pairs of samples drawn 
from the same population, the difference is considered significant at the 
20% level. A difference significant at the 20% level is marked on the 
tables by one asterisk. 

These asterisk markings of the three levels of significance are made to 
facilitate a rapid identification of the most probably significant items. 

The results of the analysis of data concerning grievances is expressed 
by comparing the respective standings of union members and union offi- 
cials on each item. “Initial grievances” are those filed by union members 
on their own behalf, “other grievances” are those filed by union officials, 
usually for a cause furthering the union’s program or the operation of the 
agreement. 

The results of the analysis of all data concerning aggrieved employees 
are expressed by comparing the respective standings on each item of 
the grievers and a control group of employees having no grievances. 

Only those items were used in comparing the respective groups where 
the number of cases involved was large enough for statistical handling. 
Differences between the groups of grievers and non-grievers that are prob- 
ably most significant are those in the ¢ column of the Tables which are 
marked with three asterisks. Values of ¢ which are marked with two 
asterisks might be considered significant, but as these values decrease they 
are to be interpreted with increasing caution as chance factors are more 
apt to be responsible for the difference as ¢ values become smaller. 


Results 


Grievance Data of Foundary and Machine Shop Employees. Nine 
items concerning grievances were available. Several of the items, such 
as the classification of grievances and the disposition of grievances, had 
subdivisions. Only those items are shown in Table 2 which had large 
enough numbers of cases to justify the computation of differences. 

a. Union officials, as reflected in the grievances they file, do not refer 
to the contract in the wording of their grievances as often as do union 
members. This difference between the two groups is probably greater 
in the foundry unit than that of the machine shop. 

*b. Relative to the nature of grievances more grievances appeared to 
be filed by union officials concerning work and jobs than by union mem- 
bers, particularly in the foundry unit, but the differences are not signifi- 
cant. Union members file more grievances concerning pay and wages 
than do union officials. However, of the grievances concerning job and 
work or pay and wages, union members’ grievances show a larger per- 
centage of the latter. This is particularly true in the machine shop where 
the ratio is approximately one to three. Union members in the foundry 
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file more grievances on seniority than do their officials. The difference 
is not significant in the machine shop where the same relationship appears 
to hold. A higher percentage of machine shop grievances of union mem- 
bers are concerned with seniority than among the foundry group, although 
union officials of the machine shop agency are more concerned with sen- 
iority problems, as reflected in their grievances, than are the foundry 
union officials. Percentage figures on the other six items of the classifi- 
cation of grievances may be found in Table 3. 

c. In the settlement of grievances no significant differences were found 
between the subject groups. Grievances settled in the third step had 
values in the same direction, in favor of the union members, in both the 
foundry and the machine shop. The number of grievances involved in 
the fifth step was too small to make a reliable comparison. The sixth 
step, arbitration, also had too few cases, there being only four with which 
to make any comparisons. 

d. The machine shop grievances of union officials are definitely 
granted by the company more often than those of union members. There 
is no difference between these two groups in the foundry. This would 
seem to indicate that union officials of the foundry unit are not as able in 
formulating and processing grievances as are the union officials in the 
machine shop. The converse is likwise true in the machine shop unit, 
more grievances of union members are denied by the company than are 
those of union officials. It is indicated that in the machine shop, the 
grievances that are dropped by the union are those of its members and 
not those of its officials, but the difference is not significant. In the 
foundry an equal number of grievances of both groups seem to be dropped 
by the union. 

Personal Data of Foundry and Machine Shop Employees. Seventeen 
items of personal data were available on the employee’s personnel record. 
These data were compared for the two groups, the grievers and the non- 
grievers. Table 4 gives the figures on the comparisons for these two 
groups in the foundry and in the machine shop. An analysis of the re- 
sults obtained, when the personal data of aggrieved employees of the 
foundry and the machine shop were compared with that of their corre- 
sponding control groups, showed several significant differences between 
the two groups, the grievers and the non-grievers. 

a. Little difference was found between grievers and non-grievers in 
the foundry and machine shop in regard to education. More of the 
machine shop grievers went further than the eighth grade than did the 
non-grievers. 

b. It is indicated that foundry grievers are socially more stable in that 
fewer of them are single, more of them are married and more of them have 
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children. A greater number of the foundry grievers have children than 
do the machine shop grievers. As the needs of a family group are greater 
than a family without children and as grievances filed were primarily for 
more money, it would seem to follow that in the two machine shop groups 
employees with families should have more grievances. Differences run 
in the same direction as those in the foundry, but they are not significant. 

c. On-the application form of the company, applicants hired who 
later filed grievances had had more jobs and had worked longer than ap- 
plicants who became non-aggrieved employees. Among the foundry 
group, more of the subsequent grievers had jobs when they applied for 
work at the plant than did the subsequent non-aggrieved employees. The 
reverse was true in the machine shop group where more of the non- 
grievers had jobs at the time of application, but the differences here were 
not very pronounced. Two reasons might be postulated why foundry 
workers who have grievances were looking for different jobs, (a) due to 
more of them having families they were in need of better p:-ying jobs, 
(b) as a group they appear to be less stable socially than non-grievers. 

d. A greater per cent of employees born in the South were non-grievers 
rather than grievers in both the foundry and machine shop. A study of 
other places of birth revealed no significant differences between grievers 
and non-grievers although all differences were in the same direction for 
both the foundry and machine shop. 

e. No appreciable differences were found between the grievers and 
non-grievers with respect to weight, height, or age. This is true of both 
the machine shop and foundry groups although the ‘differences found were 
in the same direction for both groups. 


Personnel Records Data of Foundry and Machine Shop Employees. Of 
the numerous items taken from the personnel records, those in Table 5 
proved to be most interesting. 


a. More foundry grievers had personnel transactions than non- 
aggrieved employees. This indicates that foundry employees who are 
subject to personnel changes, regardless of the nature of the transaction, 
are most liable to be grievers. 

b. Aggrieved employees have more total net service than do non- 
grievers. The machine shop grievers have over two years more service 
than non-grievers whereas the foundry grievers show approximately one 
year more service than foundry non-grievers. 

c. New employees who later became aggrieved for one reason or an- 
other started at about eighteen cents an hour less than employees who did 
not have grievances. This is true in both the foundry and machine shop. 
However, it is indicated that at the time of the grievance, the aggrieved 
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employees were slightly higher in hourly rate than non-aggrieved em- 
ployees, although the difference is not significant. 

d. The total wage increase of the grievers from the time of hire to the 
time of grievance was significantly higher than the wage increases recieved 
by the non-aggrieved group for a corresponding period. It is demon- 
strated by this study that both the union and the company were cooperat- 
ing in erasing wage differences between workers, the union in pointing 
out the differences and the company in equating them. 

e. The grievers as a group were more subject to layoff than were the 
non-grievers, particularly in the foundry. However, when it came to 
temporary layoffs, over twice as many non-grievers took temporary lay- 
offs as did the grievers. This would seem to indicate that the plant 
supervision was well aware of problems it could make for itself, but on the 
other hand it has been noted that grievers have considerably more net 
service, hence more seniority, than do non-grievers. 

f. In the matter of skill level of the respective groups, the semi-skilled 
bracket of the foundry had the highest percentage of grievers in it. The 
most grievers fall in the semi-skilled bracket of the foundry and machine 
workers. It is indicated that the semi-skilled level of foundry employees 
is most liable to produce grievers. 

g. There seemed to be little difference between the annual earnings of 
the two groups although it is indicated the grievers in the foundry earned 
slightly more money in the year of 1946 than did the non-grievers. In 
comparing the annual wages of veterans and non-veterans, it is indicated 
that in both the machine shop and foundry veterans who were non- 
grievers earned more per year, than veterans who were grievers, par- 
ticularly in the machine shop group. The same relationship exists though 
not significantly in the non-veteran group. 

h. A study of the credit standing showed that of the foundry group 
the company got many more dun letters on employees who were grievers. 
It is indicated that more garnishments are served on foundry employees 
who are grievers, but the opposite was found within the machine shop 
group, although neither difference is significant. From information the 
personnel department receives from credit stores, it would seem that 
grievers are more heavily burdened with debts than are the non-grievers, 
particularly among the foundry group. 

i. The position of each employee in his respective job class or labor 
grade was studied. In the foundry group more non-grievers had attained 
maximum position in their job class. The opposite was true in the 
machine shop, but not notably so. Of foundry employees in the middle 
of their job class, more were grievers while again the opposite held true in 
the machine shop group in about the same small proportion. Little 
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significance can be attached to the few cases found in the minimum 
position. 

Medical and Welfare Data of Foundry and Machine Shop Employees. 
The eighteen items on medical and welfare data in Table 6 show several 
distinct differences between the subject groups. 

a. From the standpoint of medical classification, machine shop 
grievers are more healthy or less handicapped as more of them fall in 
“Class A’ which means unrestricted job placement. However, the 
grievers, particularly in the foundry, visit the dispensary more often 
than do the non-grievers for medical attention other than for accident 
care. A greater number of grievers file for disability benefits from the 
Employes’ Benefit Association than do non-grievers and a greater number 
of grievers collect disability benefits. This probably explains why more 
grievers belong to the Employes’ Benefit Association than non-grievers, 
particularly in the machine shop. The two groups of grievers are also 
in the dispensary more frequently with shop accidents for which they 
have a much much higher percentage of claims for accident benefits than 
do the non-griever groups of both the machine shop and foundry. 

b. Although the majority of employees subscribe to the group life 
insurance plan of the Company, fewer grievers in the foundry have mem- 
bership in the plan than do non-grievers. The reverse is true in the ma- 
chine shop where more grievers hold group life insurance. Grievers in 
both the foundry and machine shop subscribe much more heavily to the 
group savings plan which is a credit-union or lending agency, than do 
non-grievers. Group hospitalization, on the other hand, shows an op- 
posite picture. Where practically all non-grievers carry the group hos- 
pital plan, less than ten per cent of the grievers have hospital plan mem- 
bership. It might be suggested that grievers, as a group, have a greater 
feeling of insecurity or a poorer management of money in that more of 
them use the savings plan. Little, however, is known of the saving 
habits of either group outside of the company plan. 

c. Foundry grievers worked more days in 1945 and 1946 than did non- 
grievers and received more vacation credit. In the machine shop group, 
however, the grievers did not work many more days in the year than 
non-grievers, but they received a great deal more vacation credit. This 
is explained by the greater length of service of the group constituting 
machine shop grievers. 


Summary and Conclusions 


An attempt to make a contribution toward a better understanding of 
the problems of labor and management, as reflected in aggrieved em- 
ployees and their grievances, was made by a statistical analysis of the 
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grievances and their makers of a large Midwestern industrial plant. The 
plant had two unions, a foundry union of five and one-half years existence 
by the end of 1946, the period covered by this study, and a machine shop 
union thirteen months old prior to December, 1946. Foundry and 
machine shop data were studied separately, the grievances of each being 
separated into two groups (a) initial grievances, those filed by union mem- 
bers and (b) other grievances, those filed by union officials. Only griev- 
ances that had been reduced to writing were used in the study. A group 
of non-aggrieved employees was equated with the aggrieved employees 
as controls and 53 items of personal and personnel data of both groups 
were compared for both the foundry and machine shop agencies. 

In general, relative to grievances, the older foundry union and the 


‘machine shop union did not differ in many respects. Results of the 
study of grievances showed: 


a. The most frequent grievances are filed for pay and wages (30%), 
the next largest group of grievances concerned jobs and work (28%) with 
grievances concerning seniority coming third (10%). 

b. Union officials filed the highest per cent of grievances on matters 
of jobs and work; union members filed the highest per cent of grievances 
on seniority and pay and wages. The majority of grievances filed did not 
refer to the contract in any respect; of those which did refer to the con- 
tract, union members and not officials were the more numerous. How- 
ever, in these items concerning grievances the differences were not signifi- 
cant enough to warrant any conclusions. 

c. Only in the machine shop was a significant difference found in 
grievances granted by the company. Here union officials had more of 
their grievances granted than did union members. 

d. This study analyzed 766 separate grievances of 327 employees. 
It was indicated that grievers have held more jobs and have worked longer 
than non-grievers and more of them, in the foundry group, had jobs at 
the time of application to the company than did non-grievers. The 
group of grievers was found to have worked longer for the company than 
had the non-grievers and had accumulated more seniority, particularly 
in the machine shop group as shown by vacation earned. 

e. Grievers started at a significantly lower hourly rate than the non- 
grievers, but were equal at the time they filed their grievances. 

f. Grievers had received much larger wage raises than non-grievers. 

g. Although the annual earnings of the two groups were approxi- 
mately the same, grievers showed a higher skill level than non-grievers, 
more of the machine shop grievers had reached maximum position in their 
respective labor grades, the opposite was true of the foundry group. 

h. The credit standing of grievers probably is lower than non-griever 
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as the grievers, particularly in the foundry, had more dun letters in the 
company files as well as having been served a few more garnishments. 
As far as the company records went, from demands made on the company 
by credit stores, the foundry grievers were more in debt. 

i. Grievers, as a group, go in very strongly for the group savings plan 
at their plant, a credit-union or lending agency, yet very infrequently 
participate in the group hospitalization plan. 

j. More non-grievers, in the foundry, have membership in the group 
life insurance plan; the opposite is true in the machine shop where grievers 
appear more interested in life insurance. 

k. More grievers subscribe to the Employes’ Benefit Asociation, and 
this is definitely indicated for the group paid many more visits to the 
dispensary for medical reasons as well as shop accidents. More grievers 
collect benefits for sickness and accidents as well as compensation for 
disability. More foundry grievers take off time for personal disability 
than do non-grievers. 

l. It was indicated that grievers, as a group, are in better physical 
condition than non-grievers. 

m. More grievers are married and have children than non-grievers, 
particularly in the foundry. 

n. Of employees who had been born in the South the larger per cent 
were non-grievers. 


Evidence is presented which demonstrates that employees of this 
particular Midwestern manufacturing concern show;significant differences 
when divided into two groups, one composed of aggrieved employees, the 
other of non-aggrieved employees. The study simply points out the 
degree of difference between the two groups on various personal and 
personnel items; it does not propose to explain the reason for the differ- 
ences found. In order to do this two approaches might be necessary, 
that of opinion research built around the significant items, and secondly, 
a sound clinical study might shed light on some of the reasons grievers 
appear to be a different and possibly less stable group as reflected in their 
medical, accident, and credit records. An analysis of grievances such as 
is here presented might be of aid to both supervision and union officials 
alike in finding where their problems lay. The results may definitely be 
worked into a training program of both groups. The study demonstrates 
that data concerning grievances and their makers are easily subject to 
statistical analysis. It is hoped that the methodology used in this in- 
vestigation will stimulate further research on a broader industrial basis 
and that the results here obtained will bring about a better understanding 
of the problems of industrial employees. 


Received March 8, 1948. 
Early publication. 








Additional Distributions of Test Scores of Industrial 
Employees and Applicants 


Myles H. MacMillan 
Ingersoll Steel Division of Borg-Warner Corporation 


and 


Harold F. Rothe 
Stevenson, Jordan & Harrison, Inc., Chicago, Illinois 


In an earlier paper data were presented to show that applicants for 
industrial jobs often make a distribution of empleyment test scores that is 
different from the distribution of scores on the same test made by the 
employees against whom the test had been validated. That is, the dis- 
tribution for applicants is shifted toward the higher, or better, end of the 
scale. Three possible variables in the testing situations that may account 
for this shift, namely age, military experience with tests, and combining 
office and shop applicants’ data, were controlled in the previous analysis, 
and were shown to be unrelated to the shift. One other factor was partly 
controlled. This was the possibility that “the word gets around” among 
the supply of potential applicants with the result that only the “better” 
applicants apply. Another suggested reason was that the incentives to a 
good test performance were higher for applicants than they were for the 
employees who had been promised that their jobs would not be affected 
by their test results. It was concluded that this latter phenomenon was 
the reason underlying the shift. 

Some additional data that are relevant to this problem have been 
collected and it is the purpose of the present paper to present these and 
to relate them to the problem. It is believed that these data lend further 
support to the hypothesis that the reason for the shift is one of greater 
test-taking incentivation, and not one of the word getting around.? 


Discussion 


If the hypothesis is true that the word gets around and attracts better 
qualified applicants, it appears logical to assume that the word takes some 


1 Rothe, H. F. Distributions of test scores of industrial employees and applicants. 
J. appl. Psychol., 1947, 31, 480-483. 
*E. L. Stromberg. Testing programs draw better applicants. Person. Psychol, 
1948, 1, 21-29. 
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time in getting around. That is, the improvement in applicants’ quali- 
fications should appear gradually and not all at once. Thus, the im- 
provement should be a gradual one and successive samples of applicants 
should show successively higher distributions. 

On the other hand, if the reason for the shift is one of incentivation, 
the shift should appear suddenly in a first sample of applicants, and suc- 
cessive samples of applicants should give the same distributions as the 
original sample of applicants, all being equally higher than the original 
distribution for employees. This, of course, depends upon the samples 
being small enough to reflect any shifts that may occur. At the same 
time the samples should be large enough to give statistical significance to 
any results that are analyzed. 


Data from One of The Original Plants 


Data are available from one of the original plants for one of the tests. 
The data presented here are the only ones that are available from the 
original situations because of decreased employment (with the decreased 
turnover) in those plants and because in one instance another test has 
been substituted for one of the original ones. 

The Code Identification Test was originally validated against fifty- 
sixemployees. The first follow-up analysis was made after 129 applicants 
had been tested and a shifted distribution wasfound. A second follow-up 
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Fie. 1. Test scores of industrial employees and applicants on Code Identification Test. 
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was made when seventy-five more applicants had been tested. The 
distribution for this second group was practically identical with the dis- 
tribution for the first group of applicants. These three distributions are 
shown in Figure 1. 

These data are summarized in Table 1 where it can be seen that the 
Critical Ratio between the means of the two successive groups of ap- 
plicants is 0.12. 

Table 1 


Test Scores of Employees and Applicants 








Group N xX 8.D. C.R. 





Applicants—1 129 34.9 13.85 
Applicants—2 75 35.1 11.49 


2.97 


Employees 56 27.8 ny 
} 0.12 





It is apparent that either the word got around immediately, once and 
for all, or else some other variable was operating. It is concluded that 
another variable, the incentivation of the various groups, explains this 
shifting of applicants over employees. 


Data from Other Plants 


Data are also available from another plant in which the Wonderlic 
Personnel Test was administered routinély to all applicants. Some ap- 
plicants were white and others were colored. The presince of a large 
group of colored applicants who live in one area of the city permits an 
ideal situation for the word to get around the normal labor supply of this 
plant if such is to happen at all. 


Table 2 
Test Scores of Negro Applicants 








8.D. 





7.92 
6.23 
7.19 





The first three successive groups of one hundred applicants each, 
negro and white, were analyzed and are summarized in Tables 2 and 3. 
The critical ratios are, for groups 1 and 2, 0.84, groups 2 and 3, 0.62 
and groups 1 and 3, 1.35. It.is especially interesting that these successive 
samples of applicants showed lower, not higher, mean scores. If the 
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word got around here, it had a negative effect. This is contrary to the 
expectations of those who believe that the use of tests will ay 
improve the qualifications of applicants. 

The critical ratios for the white applicants are, for groups 1 and 2, 
3.72, groups 2 and 3, 1.14 and groups 1 and 3, 2.45. ® Here again the shift 
was downward. 











Table 3 
Test Scores of White Applicants 
Group N ° x i 4% ISD. 
1 100 18.7 8.57 
2 100 17.2 8.52 
3 100 17.7 9.16 





The same procedure of testing applicants before attempting to validate 
the tests was used at two other plants. In both instances standardized 
general intelligence tests were used. In one plant the second group of 
applicants had a higher distribution than the first group, with a C. R. of 
.40, and in the other plant the distribution shifted downwards, with a 
C. R. of 2.37. There were about 150 persons in each sample in both of 
these plants. 


Additional Controls Needed 


All of the above data point to the same conclusion, namely, that the 
use of tests has no effect on the qualifications of applicants. The mere use 
of tests does not attract “better” applicants. It is probably true that in 
a few instances some applicant may state that he has come to a specific 
plant because he heard tests are used and only intelligent people work 
there. But these are isolated instances, if they do occur, and are of no 
statistical significance. 

The data presented in these two papers have been collected under 
actual industrial employment office situations. There are still some con- 
trols lacking before this problem can be solved in an experimental manner. 
For example, there is the possibiiity that the various test administrators 
may have, for some unexplained reason, affected the various testees differ- 
ently. This possibility has been completely uncontrolled although in all 
instances standardized procedures were supposedly used. Nevertheless, 
the personalities of the administrators may have affected these situations. 

It would be desirable to test a group of applicants and at a later date 
to re-test all those who had been hired. If the conclusion reached in 
these papers is correct, it would be expected that the re-test distributions 
would be lower than the employment office distributions for the same per- 
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sons. The writers have heard of one instance where this was done, with 
the results described above, but no data are available to them on that 
point. 

Another point is that the employees described in these papers were 
tested in large groups and the applicants tested either individually or in 
very small groups. Perhaps, the employees could have been more highly 
incentivated if they had been tested individually. On the other hand, it is 
possible that there was some social effect that did indeed lead them to get 
relatively high scores. If so, this group effect was apparently not as 
effective as was the incentive of a job that was held up before the ap- 
plicants. 

Conclusion 


The conclusions from present data are substantially the same as the 
conclusions of the original paper. When tests are validated against the 
existing employee force, a follow-up analysis is needed in order to check 
the critical score, if a critical score is used. The mere presence of tests 
in the employment office does not guarantee that applicants will be more 
highly qualified than are the present employees. The tests must be vali- 
dated for the jobs in question. The greater test-taking incentivation of 
applicants appears to account for the shifted distribution of their scores 
as compared with the distribution of employees’ scores. 

Received April 29, 1948. 
Early publication. 








On the Validity and Reliability of the Job 
Satisfaction Tear Ballot * 


Willard A. Kerr 
Illinois Institute of Technology 


The Tear Ballot for Industry, General Opinions, was developed in 1944 
for the purpose of attempting to measure job morale or job satisfaction. 
Items were obtained from examination of the psychological and personnel 
literature, and each item was subjected to the critical appraisal and re- 
vision of a panel of five industrial psychologists. Each word in each re- 
tained item was checked for acceptability at low vocabulary level against 
the Thorndike (3) word list. 

As this test utilizes the original tear method of response described 
elsewhere (1), a saving of from ten to twenty per cent in administration 
time is effected by elimination of the necessity of distributing and col- 
lecting pencils. In the industrial situation, particularly when testing is 
done on company time, this saving becomes significant. A second im- 
portant advantage of the tear method is that the employee is impressed 
with its obvious anonymity when he finds that he is required to write no 
identification nor even any pen or pencil marks which might establish the 
identity of his replies to the questions on the tear ballot. 

An average employee can answer all questions and cast his anonymous 
ballot within, two to three minutes. Acceptable reliability is obtained 
not by having a burdensome number of questions, but by utilizing the 
five-point continuum in answering, which, according to the experiments 
of Remmers and Adkins (2) and others, yields higher reliability than the 
use of fewer than five response alternatives. The addition of alternatives, 
in fact, increases reliability as predicted by the Spearman-Brown formula 
until at least a limit of five is reached. Addition of alternatives after five 
is reached seems to extract such sharply decreasing returns in increased 
reliability that the increase in administration and scoring time is probably 
not worthwhile. 

Validity 


It seems to be believed generally that the problem of validating a job 
morale test or any other type of attitude scale is a very difficult one. This 


* The author acknowledges the invaluable assistance of D. R. Elrod, G. R. Hugman, 
D. K. Kohler, W. A. McNichols, Jr., and Mervin Rudolph in completing these studies. 
The Tear Ballot for Industry is available from the Industrial Psychology Laboratory, 
Illinois Institute of Technology, Chicago 16, Illinois. 
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ANSWER BY TEARING! 


THE TEAR BALLOT FOR INDUSTRY, GENERAL OPINIONS 
by Willard 4. Kerr, Ph.D. 


OEAR EMPLOYEE: It is the obligation of each of us in this company to try to improve t = 

Diness and welfare of others. you are one of the large random sample of suplogess bei ~-7 

to cooperate in this sincerely constructive scientific survey of opinions. Wo one will ever 

ae Se aoe tonaed te ce ee iy a ey You don't sign your name ~- in fact, you 
your writ On this new t or 0 

sincere and honest expression of your opinion te requested. = a ae 


DIRECTIONS: Check one answer to each question by TEARING THE ARROWHEAD 


a. Does the company make you feel that your job is reasonably secure as ti, 
as you do good work? 


a. Yes, job seems wholly secure 
2. Usually 
3- About half the time 
y. Rarely ---- 
s- No, job seems very insecure ----> 


























In your opinion, how does this company compare with others in its interest 
in the welfare of employeés? 


a. It's tops, shows more interest than afhy other 
2. Slightly above average 
3. It is average 
4. Slightly below average 
s+ Poor, shows less interest than other plans 

















How does your immediate superior compare with other managers, foremen, or 
section leaders as to supervisory ability? 


1. Among the best 

2. Slightly above average 
3. Average 

4. Slightly below average 
5- Among the worst 





Considering your work, are your working conditions comfortable and 
healthful? 


a. Yes, excellent ---------------------------------- ----- 
2. Slightly above average 
3. Average for type of work 
q- Slightly below average 
5s. No, very bad -- 














Are most of the workers around you the kind who will remember you when you 
pass them on the street? 


1. Yes, they are very friendly 
2. Yes, usually --- 

3- About half the time 
4. Rarely --- 
s+ No, they are unfriendly 

















Do you think your income is adequate for your living needs? 


a. Yes, enough for enough luxuries 

2. Slightly above average 

3. Just enough for average comfort 

y Barely enough to get by on 

s- Much less than enough to get along on 


Copyright Iouu by INDUSTRIAL OPINION IBSTITUTE 
All rights reserved. Printed in 0.5.4. 


copyrighted. The reproduction of any part of it by mimeograph, or in any Other way, whether the re- 
Productions are sold are furnished free for use, is a violation of the copyright law. For origin and fdrst use 
the tearing method, see ferr, ¥. 4., Where They Like fo Work, - JOURNAL OF APPLIED PSYCHOLOGY. 1943, 27, 438-442. 
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7. Do you feel that you have proper opportunity to present a problem, complaint 
or suggestion to the management? 


a. Yes, always --------------------------------------------,----- - 
2. Usually ------------------------------------------------}----- a 
3. On occasion --------------------------.------~-.--------}.---- » 
qa. Rarely -------------------------------------------------}----- _ 
s- NO, never ----------------------------------------------1----- » 


8. Do you have confidence in the good intentions of the management? 


4. Yes, it is sincere -------------------------------------7----- a 
2. Usually ------------------------------------------------}----- = 
3- Half the time ------------------------------------------4----- +> 
4. Not often ----------------------------------------------4----- > 
s. No, it is insincere ------------------------------------1----- ~ 


9. Do you have confidence in the good sense of the nsanagement? 


a. Yes, it is capable and efficient -----------------------7----- - 
2. It is usnally efficient ----------------- win Weill piitiaewdeaiel we: 
3- Half the time ------------------------~-------------~----4.----- > 
4. It is often inefficient --------------------------------4----- _ 
5s. No. it is stupid and inefficient -----------------------4----- -_ 
so. What effect is your experience with the company having upon your personal 
happiness? 
a. Improves it greatly ------------------------------------1------ . 
2. Slightly beneficial ------------------------------------4----- » 
3. Little or no effect ------------------------------------4------ > 
4. Slightly disturbing ------------------------------------4----- _ 
s. Extremely harmful ----------------------------------.---1----- _ 


4a.” Special problems: Please indicate any or all of the following problems which 
are really sources of frequent annoyance to you: 


a. Inconvenient or undependable transportation ------------7--—---- > 

MOTE: We can all 2. WUnfairness in promotion policy -------------------------4----- ti 
yy op. Zl 3- Lack of time to take care of personal business a <-> 
dust report the q. Lack of attention to employee recreation ---------------4----- <> 
focts. 5. Broken promises on part of supervisors -----------------4------ > 
6. Family troubles at home -------------------..-.-.-------4------ > 

7. Poor housing conditions or excessive rents -------------------- > 


WE SHALL APPRECIATE YOUR PROPERLY TEARING each of the following tabuJation items: 








° NALC ean nnn nn nnn nn nn en nnn nn eo nnn ee qn = > 
-\ Your sex;—____..... 

ia-\ Sour Female -------------------------------------- ] ----- > ’ 

——.. , A, SNe Fie eee meee mene 7 

° P Won-Of fice ----------------------------- ] ----- > 

Yes ---------------------4----- > 

aq. , Are you a supervisor or foreman?___° RE WAT US E, 2 > 

Day shift -------------------- 4----- +> 

as-, Your houts of work (chiefly):—__Swing shsft o--- -—> 

Wight shift ------------------ 4----- > 

Rotation shift system -------- } pee > 


Fie. 1. Sample of The Tear Ballot for Industry. 


belief has some basis in fact, but the difficulties are not insurmountable, 
merely numerous; such tests ordinarily require validation from several 
different angles or viewpoints. Attitude tests, particularly, have so 
many pctential validities—even the same tests applied to the same sample 
—that one must specify “validity of what for what” when discussing at- 
titude test validation. It is obvious, on the other hand, that all of the 
potential validities of any attitude test—or, for that matter, of any test 
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or even any non-psychological measuring instrument—cannot possibly 
ever be determined. The validation reported herewith is one kind of 
validation—a validation provided from a particular viewpoint which 
implies considerable of psychological significance on the potential effi- 
ciency of the job satisfaction tear ballot for indicating festering spots of 
discontent throughout a factory, office, or large retail establishment. 

The rationale, not of the tear ballot itself, but of this particular 
validation, may be stated as follows. To the extent that job adjustment 
or maladjustment tendencies are persistently present within an employee’s 
personality pattern over a period of years, measurement of job satis- 
faction on the present job will possess validity for predicting past and 
future as well as present job satisfaction. Operationally, it is almost 
impossible to relate (ethically) the anonymously obtained present job 
satisfaction of a worker with his future job satisfaction. However, it is 
possible and practicable to apply this validation appréach to the em- 
ployee’s work history. To pursue this objective a valid measure of 
past job satisfaction is of course needed. In this study the assumed valid 
measure of past job satisfaction is past tenure (turnover reversed) rate 
which is computed simply by dividing the number of years a worker has 
been in the civilian labor market by the number of employers worked for 
in the same period of years. While this is not a perfect measure of past 
job satisfaction because of the numerous factors other than personal 
morale which produce individual cases of turnover, it nevertheless is a 
highly useful operational criterion in view of the fact that turnover is so 
often the economically wasteful result of being “fed up” with the boss, 
company, work associates, the job itself, etc. 

In other words, among these scoring low in satisfaction with present 
employment are some workers who have been so dissatisfied in past jobs as 
to have been exceptionally frequent job turnover cases in the past. Theo- 
retically, therefore, an efficient job morale index should be related to some 
extent with the past turnover history of a reasonably representative 
sample of employees. It is apparent that this validity demand is a 
conservative rather than an easy one because, first, it asks that a rela- 
tionship be found when the criterion admittedly is to some large extent 
a function of irrelevant factors, and, second, it asks that the test predict 
not satisfaction on present job (which, if done, actually is enough) but 
past turnover history as well. 

In order to determine the ability of The Tear Ballot for Industry to 
predict past turnover rate of a reasonably random sample of wage- 
earners, a supplementary tear ballot was prepared and stapled to the front 
of the regular tear ballot. The supplementary ballot explained that “we 
are attempting to poll a random sample of employees in all leading in- 
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dustries and types of work” and requested the respondent to indicate the 
one of the fifteen industries in which he is employed, “total years of ex- 
perience which you have had in all your civilian jobs together,” and 
“during these same total years, how many companies or institutions did 
you work for?” 

A total of four trained interviewers distributed themselves in public 
places ovér a large southern city and administered anonymously the 
regular and its attached ballot to a total of 98 wage earners distributed 
by industries as follows: agriculture 1, building and construction 11, 
distributing and selling 22, education and research 5, finance and banking 
1, forestry and fishing 1, government 5, manufacturing 24, mining and 
minerals 2, printing and publishing 2, recreation 1, service occupations 5, 
telephone and telegraph 7, transportation 9. 


Table 1 


Pearsonian Coefficients of Correlation between Tenure Rate and Job Satisfaction 
Items in The Tear Ballot for Industry 


Note: Coefficients significant at 1% level are set in bold face type. Remaining 
ones significant at 10% level. 








1. Does the company make you feel that your job is reasonably secure as long 





as you do good work? 45 

2. In your opinion, how does this company compare with others in its interest 
in the welfare of employee? 14 

3. How does your immediate superior compare with other managers, foremen, 
or section leaders as to supervisory ability? 18 

4. Considering your work, are your working conditions comfortable and 
healthful? .26 

5. Are most of the workers around you the kind who still remember you when 
you pass them on the street? .63 
6. Do you think your income is adequate for your living needs? .26 

7. Do you feel that you have proper opportunity to present a problem, com- 
plaint, or suggestion to the management? .28 
8. Do you have confidence in the good intentions of the management? .33 
9. Do you have confidence in the good sense of the management? .32 

10. What effect is your experience with the company having upon your personal 
happiness? 17 
Total satisfaction score, unweighted ~ .25 
Total satisfaction score, weighted .36 





A job tenure rate (reverse of job turnover) was then computed for 
each wage-earner by dividing his total time in the civilian labor market 
by the number of jobs held during the same period of time. This job 
tenure rate was utilized as an independent criterion against which to cor- 
relate the individual job satisfaction items of The Tear Ballot for Industry. 








E 
q 
i 
Lo 
i 
ie 
i 


280 Willard A. Kerr 


As indicated in Table 1 the past tenure item validity coefficients range from 
.14 to .63 among the ten principal items. The responses of Item 11 (this 
item included in the test for additional diagnostic use) correlated from 
.06 to .60 with the criterion, but these correlations are lacking in expected 
consistency, indicating that the score on Item 11 should not be included 
in the total score although a tabulation of replies to Item 11 does have 
diagnostic usefulness. Inspection of Table 1 reveals that seven of the 
ten principal items correlate significantly, at the one per cent level of 
confidence, with the criterion, while the remaining items are all signifi- 
cant at the ten per cent level or better. The unweighted total score corre- 
lates significantly at the ten per cent level of confidence, but when the 
testitems are weighted for the specific purpose of predicting past tenure 
rate, the total score is then found to yield a validity coefficient of .36 with 
the criterion. While not high this coefficient is approximately the same 
as usually is obtained, for example, between the better industrial vision 
test batteries and success on the job in near-point acuity operations. Of 
course, verification of these findings should be made with additional 
groups of workers before definite conclusions are warranted. 

The highest coefficient in Table 1 is that between tenure rate and 
opinion of work associates, suggesting the great importance of the presence 
or absence of strong inter-personal ties in preventing or causing employee 
separation. In this particular validation, separation history seems nota- 
bly more related with that type of emotional security associated with 
being surrounded with friends than with either job security feeling or 
evaluation of adequacy of present wages. This finding suggests that the 
ability to make and retain friendships is the stable personality factor 
(longitudinal) involved most in job adjustment. Probably the relatively 
negligible correlation between separation history and opinion of present 
supervisor is obtained because this other aspect of emotional security 
is generally more impersonal, formal, and less intimate than is the 
typical employee “‘pals’’ relationship. 


Reliability 

While an industrial morale test is intended, because of its anonymous 
nature, only for departmental or factory rather than individual diagnosis 
and prediction—and thus theoretically does not require the level of con- 
sistency of measurement desired in instruments for individua! prediction 
—the accumulated evidence on the reliability of The Tear Ballot for 
Industry indicates it to be as reliable as many psychological tests which 
have been found to be of value for individual prediction. 

Split-half reliability coefficients corrected by the Spearman-Brown 
prophecy formula, have been computed on eight different employee 
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groups. These coefficients, ranging from .65 to .82, follow (preceded by 
identification and number of cases) : 84 ship carpenters .82; 20 male retail 
supervisors .68; 13 female retail supervisors .80; 7 male retail office em- 
ployees .65; 70 female retail office employees .80; 58 male retail clerks 
.76; 86 female retail clerks .68; 125 female operators in a shirt factory 
.73. The median coefficient in this accumulated experience is .75 which 
suggests a satisfactory level of internal consistency for group diagnosis 
and prediction. 
Summary 


1. A brief measure of job morale, The Tear Ballot for Industry, util- 
izing the tear method of response, was constructed at low vocabulary 
level and administered to various business and industrial groups. 

2. The probability hypothesis was advanced that for certain person- 
ality determinants of job morale, the present is psychometrically an aver- 
age of recent past and near future; that some of the variance in job morale 
is a function of these relatively stable personality characteristics; and 
that for these reasons a valid test of job morale will correlate significantly 
with past turnover rates of wage earners. 

3. This validity hypothesis is tested in a random sample survey of 
98 wage earners in a large southern city. A reversed turnover rate was 
obtained on each respondent along with the anonymous Tear Ballot. 
This tenure-rate criterion was found to correlate significantly at the 10% 
level of confidence with each of the items of the test, even though much 
of the variance of the criterion is due to irrelevant and uncontrollable 
factors. The instrument successfully meets this test of validity. 

4. In determination of test reliability on administration to eight 
different business and industrial groups, a median coefficient of relia- 
bility of .75 is obtained. 


Received October 21, 1947. 
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A Proposed Short Form of the Kuder Preference Record * 


Ray W. Miles 
Louisiana State University, Baton Rouge, La. 


Anyone who has used the Kuder Preference Record! very extensively 
has found occasions when he wondered if some shorter procedure might 
be found to give a systematic record of one’s vocational interests. Each 
item in the Preference Record consists of three activities from which the 
subject must choose one he likes most and one he likes least. Three 
comparisons are necessary for each item: the first activity with the second, 
the first with the third, and the second with the third. There are four- 
teen items to the page and twelve pages in the booklet. Thus the subject 
must make forty-two choices for each page or 504 choices in completing 
the Preference Record. The average college student completes the task 
in forty or forty-five minutes but those with less education, who read 
and think at a slower rate, often take much longer. Occasionally a 
counselor wishes a record of the vocational interests of a subject with 
very limited education who takes an undue amount of time to make his 
choices. It is obvious that one could complete three pages in about one- 
fourth of the time required for twelve pages. 

Examination of the Preference Record and of the answer pad indicates 
that certain pages might give an expression of one’s interests, and that the 
completion of the whole record might not always be necessary. Answer 
pads that had been completed by thirty-five men were carefully examined 
to find which pages might serve as a short form of the test. It was found 
that responses for the items on pages seven, eight, and nine yielded partial 
scores which, for every field of interest, bore rather constant ratios to 
actual scores. A count was then made to find the total possible score 
for each section or each field of interest and the partial score possible in 
each section if only pages seven to nine inclusive were used. By this 
process of counting it was found that the ratios between possible scores 
on the entire record and partial scores one might possibly make on the 
three pages corresponded rather closely to ratios already found by in- 
spection of thirty-five completed answer pads. Possible seores and 
suggested weights are shown in Table 1. It seems probable that a sub- 

* The assistance of Dr. Howard Turner of Southwestern Louisiana Institute, Lafa- 
yette, Louisana in the completion of this study is gratefully acknowledged. 

1G. Frederic Kuder, Preference Record, Chicago: Test Service Division, Science 
Research Associates, 1942. oe 
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ject might answer the items on pages seven, eight, and nine and that his 
partial scores might then be weighted to give approximately the same 
scores he would have made had he answered all twelve pages. Weights 
proposed for these partial scores, shown in Table 1, are as follows: 
Mechanical, 5; Computational, 3; Scientific, 5.5; Persuasive, 3.5; Art- 
istic, 3.5; Literary, 4.5; Musical, 5.5; Social Service, 3.5; and Clerical, 
4. It was found that in nearly every case the total actual score was not 
greatly different from the product obtained by multiplying the partial 
score by the weight suggested here. 











Table 1 
Data from Which Weights were Derived 
Total Possible Weight Given 
Possible Score, to Partial 
Key Score p. 7-8-9 Score 

1. Mechanical 192 39 5 

2. Computational 111 36 3 

3. Scientific 168 30 5.5 

4. Persuasive 210 60 3.5 

5. Artistic 153 42 3.5 

6. Literary 159 36 4.5 

7. Musical 69 12 5.5 

8. Social Service 206 60 3.5 

9. Clerical 177 45 4 





Answer pads completed by 205 men representing an educational 
range from the fourth grade to the Bachelor’s degree were taken for a 
more careful examination, to find the reliability of the scores that would 
have been obtained by using only pages seven through nine and weighting 
these partial scores as suggested above. It was found that in keys 1, 2, 
3, and 5, the actual scores were slightly higher on the average than the 
weighted scores. In keys 4, 7, 8, and 9, actual scores were slightly 
lower on the average than weighted scores. In key 6, however, weighted 
scores tended to be considerably higher than actual scores; there was a 
tendency for this to be true with the higher scores more often than with 
the scores that were about average or below. Comparisons of actual 
scores and weighted scores are shown in Table 2. 

It is worthy of note that weighted scores were more variable than 
actual scores. There was a tendency for one who made a low actual 
score on a key to make a relatively lower partial score on the three pages 
studied. Also, those who made a higher actual score on a key tended to 
get still higher scores on that key when their partial scores were weighted. 
However, this is not a disadvantage, for the chief use of the Preference 
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Table 2 
i Means and Standard Deviations, Actual Scores and Weighted Scores, 
rt Based on 205 Cases 
i Actual Scores Weighted Scores 
Mean 
i Key Mean Sigma Mean Sigma Difference 
1. Mechanical 83.60 20.80 80.80 22.10 2.8 
2. Computational 35.85 8.55 32.65 10.20 3.2 
3. Scientific 60.00 11.35 59.60 13.35 A 
4. Persuasive 74.70 14.70 75.50 16.80 - £ 
5. Artistic 49.60 13.50 47.80 16.00 1.8 
; 6. Literary 43.50 11.80 49.75 17.30 —6.25 
| 7. Musical 17.52 9.15 19.30 11.00 —1.78 | 
i 8. Social Service 74.40 14.70 76.10 19.00 —1.70 i 
9. Clerical 56.80 13.80 57.70 16.80 — 2 
i 


Record is to point out preference areas within which one should find 

occupations to investigate. The counselor’s chief interest is to find in 

| which preference areas the subject has a greater than average degree of 

. interest. This purpose can be served if the subject answers only pages 

i seven through nine and if the partial scores are weighted as suggested 

i above and percentiles found for the weighted scores. 

Coefficients of correlation between the actual score and weighted 
scores for 205 cases are shown in Table 3. These seem sufficiently high 
to justify the use of only three pages of the Preference Record with those 

- very slow readers who would have difficulty in completing the whole 

} record. There might also be justification for using a short form of three 

i pages with a larger group when the time for testing is limited. 





Table 3 


bi Coefficients of Correlation Found Between Actual Scores and Weighted Scores for 
a 205 Men Ranging from Fourth Grade Education to College Graduates 

















Key Correlation 

1. Mechanical .86 

2. Computational 91 

A 3. Scientific .76 

Eh 4. Persuasive 85 

f 5. Artistic .85 

i 6. Literary 91 
1 7. Musical 89 
i 8. Social Service 86 j 
: 9. Clerical . 84 
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Summary 


1. Three pages of the Preference Record can be taken in one-fourth of 
the time required for the whole record. 

2. Pages seven, eight and nine of the Preference Record will yield 
partial scores that are indicative of the total scores a subject would make 
if he completed the record. 

3. Partial scores obtained can be weighted to give scores approxi- 
mately the same as total scores. Weights for the various keys are: 
Mechanical, 5; Computational, 3; Scientific, 5.5; Persuasive, 3.5; Ar- 
tistic, 3.5; Literary, 4.5; Musical, 5.5; Social Service, 3.5; and Clerical, 4. 
Percentiles may then be found, using the weighted scores. 

4. Coefficients of correlation between actual scores and weighted 
scores for 205 unselected men were found to range from .76 to .91 for the 
different serving keys. 

5. The use of this short form can effect a considerable saving in time. 


Received November 8, 1947. 





Methods for Determining Patterns of Leadership Behavior in 
Relation to Organization Structure and Objectives * 


Ralph M. Stogdill and Carroll L. Shartle 
The Ohio State University 


The Personnel Research Board of Ohio State University has under- 
taken a series of studies under the title ‘Leadership in a Democracy.” 
One phase of these studies includes an investigation of executive positions 
and organization structures in industrial, military, educational, and civil- 
ian governmental organizations. The aims of this research are to de- 
velop improved methodology for studying leadership, to establish criteria 
for judging it, and to prepare information and techniques which may be 
useful in selecting and training persons who may occupy leadership 
positions in various types of organization structures. 

The studies are interdisciplinary in character, involving the points 
of view of various sciences, particularly economics, psychology, and 
sociclogy.' 

The objectives to be accomplished, and the postulates which deter- 
mine the methods employed in this research, have been formulated in 
broad, general terms so as to provide scope for investigation. A pre- 
liminary survey’ of the experimental literature suggests that leadership 
is not a unitary human trait, but is rather a function of a complex of 
individual, group, and organizational factors in interaction. Leadership 
resides in individuals, but only by virtue of their interaction with other 
persons. Leadership must, therefore, be studied as a relationship be- 
tween persons, and as an aspect of organizational activities, structures, 
and goals, A comprehensive formulation of the problem is required in 
order to take these factors into account. 

The methods being developed for these studies represent a rather 
marked departure from those usually employed for the investigation of 

* This particular study is a cooperative contribution of the U. 8S. Navy, Office of 
Naval Research, and The Ohio State University Research Foundation. The opinions 
presented are those of the authors, and should not be regarded as having the endorse- 
ment of the Navy Department. 

1The Leadership Studies staff includes C. L. Shartle, Professor of Psychology, 
Director; Alvin E. Coons, Assistant Professor of Economics, Melvin Seeman, Instructor 
in Sociology, and Ralph M. Stogdill, Research Associate in Psychology, Associate 
Directors. 

* Stogdill, R. M. Personal factors associated with leadership: a survey of the liter- 
ature. J. Psychol., 1948, 25, 35-71. 
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problems in leadership. For this reason, it seems desirable to answer at 
the outset certain questions that arise, and to state certain hypotheses 
which determine the design of our methods. The studies are proceeding 
on the assumption that leadership is a process of interaction between 
persons who are participating in goal oriented group activities. Three 
concepts are implied in this assumption which should be made explicit. 
The first is that leadership resides in specific persons. The second is that 
leadership is an aspect of group organization, and the third is that leader- 
ship is concerned with attaining objectives. 

If leadership is concerned with goal oriented group activities, it 
seems appropriate to study those members of an organization who deter- 
mine goals and objectives and who control the means by which these 
goals are attained. It is thus assumed that leadership in some form exists 
in top administrative positions, as well as at other levels in the organi- 
zation. The question as to whether leaders or excutives are being studied 
appears to be a problem at the verbal level only. 

It is assumed that it is proper and feasible to make a study of leader- 
ship in places where leadership would appear to exist and that if a person 
occupies a leadership position he is a fit subject for study. One further 
assumption that has been made is that leadership is related to getting a 
job done and that it is therefore appropriate to study the work patterns 
of the leaders and of the followers and the working relationships among 
the members of the organization. The soundness of these assumptions 
is being tested in the research. 

The methods and procedures of the leadership studies may be stated in 
general terms as follows: 


1. The first step has been to appraise the literature in the field, to 
formulate hypotheses which seem basic to the problem, and to develop 
methodology for the testing of these assumptions. 

2. The second step is to discover what leaders do. The facts 
concerning leadership activities and organization structures are ob- 
tained primarily by means of the interview, supplemented by direct 
observation, by questionnaires, and by a study of organization manu- 
als and other materials. The initial interview requires approximately 
three hours with each executive. A modified job analysis is made for 
each position. Sociometric methods are applied in analyzing organi- 
zation structures in relation to leadership activities. A number of 
new techniques are in the process of development. The Navy studies 
here described are primarily in the second stage, which involves the 
accumulation of data and the development and improvement of 
methodology. 
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3. The third step involves the development of methods for the 
analysis of data, in order to discover the relationships between such 
factors as responsibilities, authority, work patterns, level in the or- 
ganization, compensation, persons with whom most time is spent, 
proportion of time spent in individual effort, methods of getting work 
done, methods of working with staff, type of organization structure 
and objectives of the organization. 

4. The fourth step, as now projected, will be undertaken after 
data have been accumulated from the study of a variety of organiza- 
tions. The ultimate objectives, as previously indicated, are the 
development of criteria for evaluating leadership in various types of 
organizations, and the preparation of information and techniques 
which will be useful in the selection and training of leaders for various 
types of situations. 


While a strong effort has been made to state objectives in terms which 
would not delimit or over-structure the conception of the leadership 
problem, the research procedures have not remained unstructured. In 
fact, the stated objectives can only be accomplished through the careful 
integration of a variety of research approaches including job analysis, 
organizational analysis, the interview, questionnaires, attitude scales, 
sociometrics and other methods and techniques. 

The methods being developed are designed to study formal organiza- 
tion as a complex of relationships and processes. An attempt is being 
made to analyze as completely as possible the interrelationships which 
determine leadership status. Not all the factors which define leader- 
ship are easy to describe in quantitative terms. However, a strong effort 
is being made to reduce all data to terms which will permit quantitative 
treatment. 


Progress has been made in the quantification of the following variables: 


1. Level in the organization (usually defined by location of an executive’s 
position on an organization chart.) 

2. Responsibility patterns (may be defined by law, by organization manuals, 
or by common understanding in absence of above). 

3. Sociometric scores (defined in terms of the number of times an executive 
is mentioned as being one with whom most time is spent in getting work done). 

4. Work patterns (defined in terms of what the executive actually does and 
the methods he employs in carrying out his duties). 

5. Personal contacts (defined in terms of executive’s own estimate of per- 
centage of time spent with persons, as opposed to individual effort). 

6. RAD Index (defined in terms of an executive’s estimate of his own 
responsibility and authority status and of the delegation of authority to his 
subordinates). 

7. Methods of working with staff (defined in terms of a rating scale applied 
to an executive’s statement of his methods for getting the best work out of his 
staff). 
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Thus far in the Navy project, three staff orgainzations have been 
studied. In the first two, a stratified sample was studied. In the third, 
all commissioned officers were studied. 

In order to illustrate the application of methods, the results obtained 
from the study of a single staff are presented. This isa Naval Command 
staff, the primary mission of which is the coordination of a wide variety 
of administrative activities within the shore establishment. Twenty- 
four top line and staff positions were studied. Six levels in the organi- 
zation structure were represented. No civilian executives were included. 


Working Relationships 


Each officer interviewed was asked to name in rank order the persons 
with whom he spent the most time in the process of getting work done. 
The resulting lists provide a basis for making a sociometric study of 
working relationships among the various members of the staff. Working 
relationships among the staff members, as revealed by sociometric dia- 
grams, depart rather markedly in some departments from the formal 
organization chart. It is apparent that organization manuals and organi- 
zation charts define responsibilities and lines of authority, rather than the 
informal organization of day-to-day working relationships. Sociometric 
ratings reveal some tendency for a concentration of contacts in those 
officers who are most actively engaged in carrying out the major policies 
of the organization at the time of the study, regardless of their military 
rank or level in the organization structure. However, sociometric ratings 
correlated +.57 with level in the organization scale. The correlation of 
sociometric scores with other factors are shown in the course of discussion. 


R A D Index 


It has been postulated, for purposes of this study, that leadership is 
a function of the interrelated patterns of responsibilities of the members 
of the organization. It is assumed that effective leadership exists when 
the members at all levels in the organization are making their maximal 
contribution in carrying out responsibilities essential to the success of 
the enterprise. The effective leader would be expected to influence the 
work patterns of his immediate subordinates to a greater extent than any 
other person in the organization—and this influence would presumably 
tend to enlarge rather than restrict the contribution and participation 
of subordinates. 

In order to test one phase of this hypothesis, three scales were devised 
for the purpose of measuring the estimate of an individual regarding the 
following factors: a. His level of responsibility; b. His level of authority; 
and c. The degree of authority he delegates to his subordinates. 
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The scores derived from each of these three sets of scales have been 
combined in various ways to determine interrelationships. One of the 
possible combinations of these scores appears to be the following: 


Authority Score 
Responsibility Score 





X Delegation Score. 


The term RAD Indez, as used in this paper, refers to the above combina- 
tion of scores. 

A correlation of —.36 between RAD Index and sociometric ratings 
indicates that for this particular staff there is a slight relationship be- 
tween an individual’s estimate as to his responsibility-authority-delega- 
tion status and the extent to which he is contacted by other staff members 
in getting work done. Due to the method of scaling the items, a low 
RAD Index score was associated with high leadership status. There 
was a correlation of —.57 between RAD Index and level in the organiza- 
tion structure. Per cent of time spent in contacts with persons was 
correlated —.40 with RAD Indez. 

RAD Index scores appear to give some indication of the relation of an 
individual’s estimate regarding his status in the organization to his wil- 
lingness to provide his assistants with adequate scope for carrying out 
their responsibilities. It appears that the capacity of an individual to 
provide his subordinates with adequate scope for action may be con- 
ditioned to a considerable degree by the freedom or constraint he feels in 
discharging his own responsibilities. 


Work Patterns 


In accord with the hypothesis that leadership is a function of the in- 
terrelated patterns of responsibilities of the members of an organization, 
one would expect to find that work patterns are related to sociometric 
ratings and other factors. This was found to be the case. 

When sociometric ratings were plotted against percentage of time 
spent in major administrative functions, the highest correlations were 
found to be with planning and coordination (+.49 and +.46 respectively). 
Such functions as research, inspection and public relations showed a low 
negative correlation with sociometric ratings. 

Planning, coordination, and the preparation of procedures for the 
carrying out of plans were also positively correlated with per cent. of time 
spent in contacts with persons. The correlation was +.39 for planning, 
and +.30 for coordination, and +.26 for the preparation of procedures. 
As would be expected, research was negatively correlated with per cent 
of time spent with persons. The correlation coefficient was —.43. 
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RAD Index was found to be correlated —.60 with inspection, —.43 
with planning, and —.34 with coordination. 

A method has been developed for determining the degree of similarity 
between the work patterns of the members of an organization, and of 
showing on a chart, resembling a sociometric diagram, those clusters of 
persons whose work patterns are most individualistic. Some persons 
who form clusters in the sociometric diagrams are also clustered in the 
work patterns diagrams. 

The analyses that have been made thus far suggest that the work 
patterns of executives differ not only with such factors as level in the 
organization, but with departmental function and mission, as well as 
with changes in the objectives and activities of the organization. Further 
study will be required in order to determine what the specific patterns are 
for various types of executive positions in the lower levels. An attempt 
is being made to determine whether any uniformities exist in similar 
types of positions in different organizations. An effort is being made to 
determine, as a specific example, whether a position demanding a high 
degree of planning and coordination can be filled adequately by an 
executive whose usual pattern of work is heavily loaded with public re- 
lations, or perhaps research or supervision, or whether the position can 
best be handled by a person who has already acquired a planning-coord- 
ination pattern of work. 

Four possible criteria of leadership status have been discovered which 
indicate that planning and coordination are primary functions of top 
leadership in the staff under discussion. These four lines of evidence are 
sociometric ratings, level in the organization structure, per cent of time 
spent in contacts with persons, and RAD Index. As would be expected, 
the extent to which these possible criteria are correlated with each other 
and with other factors varies considerably from one staff to another. 

Results at the preliminary stages of the study suggest that a number 
of the methods employed hold promise for further development and 
improvement. 


Received March 12, 1948. 
Early publication. 
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Adult Leadership Scales Based on the Bernreuter 
Personality Inventory * 


Helen M. Richardson 
New Jersey College for Women, Rutgers University 


In an earlier paper Hanawalt and Richardson (3) reported an analysis 
of items in the Bernreuter Personality Inventory to which the responses 
of adult men who were leaders in vocational and social activities were 
significantly different from responses of non-leaders. Responses of 
“Office-Holders” were compared with responses of ‘‘Non-Office-Holders,”’ 
and responses of “Supervisors” with those of ““Non-Supervisors.” Analy- 
sis of the responses suggested the possibility of deriving leadership scales 
by assigning scoring weights to the significant Bernreuter items on the 
basis of the degree to which they elicited different responses from the 
contrasting groups of leaders and non-leaders. The construction of such 
scales is described in the present paper, and data on validity and relia- 
bility are set forth and discussed. 


Subjects 


Two main samples of subjects were used in the study: (a) the indi- 
viduals whose responses were employed in determination of the scoring 
weights to be assigned to the test items, designated as the Item-Weight- 
ing subjects; and (b) the individuals to whom the derived scales were 
applied for testing validity, designated as the Validation subjects. None 
of the Item-Weighting subjects were used in the Validation groups. In 
addition/to these two main samples, some subjects independent of both 
groups were included in the computations of reliability coefficients and 
of intercorrelations. 

The subjects in all cases were men aged 26 years or over, and were 
obtained chiefly by asking psychology students at New Jersey College 
for Women to request their fathers or other older men of their acquaint- 
ance to fill out the Bernreuter questionnaire anonymously and to furnish 

* Construction and testing of the 23-item scales reported in this paper were per- 
formed by Miss Ethel M. Estoppey and Miss Eleanor E. Gruber. Miss Sally E. Henry 
performed these operations for the two lengthened scales, and determined the occu- 
pational and age distributions of the subjects in the validation groups. Assistance in 
scoring and computation was rendered by Miss Dorothy Jessee, Miss Charlotte Lossow, 
and Miss Marcia Swetland. 





Adult Leadership Scales 293 


additional information as to age, occupation, number of persons under 
their supervision, and offices held since the age of 21 in any organizations 
(professional, business, civic, religious, fraternal, or social). On the basis 
of the supplementary data, respondents were classified into two pairs of 
contrasting groups: Office-Holders vs. Non-Office-Holders, and Super- 
visors vs. Non-Supervisors. Office-Holders were defined as persons who 
reported having held at least two presidencies or apparently important 
chairmanships in organizations; Non-Office-Holders were those who re- 
ported no offices. Supervisors were defined as individuals who stated 
that they had fifteen or more persons under their direction or super- 
vision, supposedly in an executive capacity. They were contrasted with 
Non-Supervisors, who reported not more than one person under their 
supervision. Respondents who could not be classified in any of these 
four contrasting groups were designated as Non-Contrasting subjects. 
The number of subjects in each group was as follows: 














Office- Non-Office- Super- Non-Super- 
Holders Holders visors visors 
Item-Weighting 57 70 90 88 
Validation : 48 56 44 45 
Table 1 


Occupational Distribution (Percentages) of the Samplings Used in Item-Weighting 
and Validation Compared with 1940 Distribution of Employed 
Adult Males in New Jersey 








Percentages Percentages Percentages 





of Employed in Item- in Vali- 
Males in Weighting dation 
Occupational Group New Jersey Groups Groups 
Professional 5.9 30.2 27.6 
Semi-professional 1.6 3.1 4.5 
Farmers and Farm Managers 1.8 4.7 2.3 
Proprietors, Managers, and Officials, 
except Farm 11.5 26.0 25.4 
Clerical, Sales 17.3 23.2 19.4 
Craftsmen, Foremen 19.4 7.0 11.2 
Operatives 22.2 3.1 2.2 
Domestic Service 0.4 -- -- 
Service, except Domestic 7.5 2.7 6.7 
Farm Laborers (paid) and Farm 
Foremen 1.7 a — 
Farm Laborers (unpaid family workers) 0.3 a — 
Laborers, except Farm 9.6 — 0 


| 5 


Occupation not Reported 1.0 
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In the Item-Weighting, groups, 23 of the 57 Office-Holders were also 
Supervisors, 7 were Non-Supervisors and the remaining 27 were non- 
contrasting with respect to supervisorship; 14 of the 70 Non-Office- 
Holders were Supervisors, 22 were Non-Supervisors, and 34 were non- 
contrasting. In the Validation groups, 20 of the 48 Office-Holders were 
Supervisors, 4 were Non-Supervisors, and 24 were non-contrasting with 
regard to supervisorship; 17 of the 56 Non-Office-Holders were Super- 
visors, 26 were Non-Supervisors, and 13 were non-contrasting. 

All of the Item-Weighting subjects filled out their questionnaires dur- 
ing the years 1939-1942, when they furnished the basis for the earlier 
item-analysis by Hanawalt and Richardson (3). Data from the Vali- 
dation groups were obtained in 1943-1944, except that ten members of 
the Validation group of Non-Office-Holders were drawn from a surplus 
of Non-Office-Holders obtained in the earlier years but not included in 
the Item-Weighting groups. 

The occupational distribution of the subjects in the contrasting groups 
is given in Table 1, together with the distribution of employed adult 
males in New Jersey according to the 1940 census (10). The column 
headed “‘Item-Weighting Groups” is based on a total of 258 respondents! 
in 1939-1942 who were classified in the contrasting groups, and includes 
the surplus Non-Office-Holders referred to above. The column headed 
“Validation Groups” is based on 134 cases! classified in the contrasting 
groups, and includes 10 Non-Office-Holders from the unused 1939-1942 
subjects along with 124 obtained in 1943-1944. This table is not meant 
to imply that our groups should have shown the same distribution as the 
census figures, but is presented merely to indicate the composition of our 
groups and to permit a comparison of the occupational distribution of 
the Item-Weighting and Validation subjects. It is evident that the two 
sets of subjects are very similar in occupational distribution. 

Table 2, presenting the age distribution of the various classes of sub- 
jects in/the Item-Weighting and Validation groups, shows the change 
that might be expected with a shift to war years from years that were 
largely pre-war. In the Validation groups (1943-1944) the percentages 
in the age group from 26-35 are considerably reduced, especially among 
the Non-Office-Holders and Non-Supervisors, which in the earlier period 
had a large proportion of younger men. In both the Item-Weighting and 
the Validation groups, the percentages of Office-Holders and Supervisors 

1 The totals cited here for the Item-Weighting and Validation groups are the num- 
bers of different individuals serving as subjects. In these totals, a person with the dual 
classification of Office-Holder and Supervisor, for example, was counted only once, 
though he figures in both these columns in the summary. The fact that the two sets 


of figures cannot be made to check by allowing for the duplications, is due to the pres- 
ence of the “surplus Non-Office-Holders” mentioned above. 
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are greatest in the ages above 45. This may indicate that, to some degree, 
leadership which is merely potential in the earlier years is realized later 
in life. Existence of such latent leadership in the younger groups of 
non-leaders might be expected to have an adverse effect on the validation 
of leadership scales. 

Table 2 


Age Distribution (Percentages) of Item-Weighting Subjects 
and of Validation Subjects 








Item-Weighting 

















Groups Non- 
Office- Office- Super- Non- 

Age Holders Holders visors Supervisors All 
26-35 12.2 41.4 10.0 60.2 32.9 
36-45 19.2 17.2 26.6 10.2 18.9 
46- 68.4 41.4 63.3 29.5 48.1 

Validation 
Groups 

Age 
26-35 4.2 26.8 11.4 24.4 18.7 
36-45 25.0 33.9 29.6 31.1 29.9 
46- 70.8 39.3 59.1 44.4 51.4 





Preliminary Study: 23-Item Scales 


The item-analysis by Hanawalt and Richardson (3) which served as 
the point of departure for the present study revealed 23 items for which 
the chi-square test indicated a significant difference (P value of .05 or less) 
in the distribution of “Yes,” “No,” and “?” responses by Office-Holders 
and Non-Office-Holders, and 23 items for which there was a significant 
difference in the responses of Supervisors and Non-Supervisors. Nine 
items were common to the two lists. For a discussion of these items the 
reader is referred to the earlier paper. In constructing scales for dis- 
tinguishing leaders from non-leaders in the two fields, a preliminary 
trial was essayed in which each scale was made up of the 23 items indi- 
cated above. 

Determination of Item Weights. for the 23-Item Scales. Scoring 
weights were derived by Keliey’s revised formula (4), following the 
method for use with semi-equalized four-fold tables described by Strong 
(7, pp. 611-615), and employed by him in deriving scoring weights for 
his Vocational Interest Tests. Strong and Carter (8) found this method 
slightly superior to Kelley’s original formula, which was used by Strong 
in an earlier scoring of the interest tests, and which was followed by 
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Bernreuter (1). Derivation of the weights was facilitated by the use of 
a chart prepared by Strong (6). This chart provided for a range of 
weights from +4 to —4. Plus values were assigned to responses which 
were more characteristic of leaders, negative values to responses given 
preponderantly by non-leaders. 

When the responses of the 57 Office-Holders and 70 Non-Office- 
Holders in the Item-Weighting group were scored according to the Office- 
Holder scale and means for the two groups were computed, t, based on 
the standard error of the difference between the means, was 10.97. 

Validity and Reliability of the 23-Item Scales. The 23-item scales for 
Office-Holders and for Supervisors respectively were then tested by ap- 
plication to entirely new contrasting groups of 30 Office-Holders vs. 30 
Non-Office-Holders and 31 Supervisors vs. 38 Non-Supervisors. Results 


Table 3 


Validity and Reliability Data from Try-Out of Preliminary 23-Item Scales 
for Office-Holders and Supervisors 








Office-Holder Su isor 
Scale e 
(30 OH; 30 NOH) (31 8S; 38 NS) 





Difference between Mean Scores of Con- 

trasting Groups 21.5 8.3 
SEaur. 3.82 2.71 
t 5.63 3.06 
Percentage of Overlapping 0.0 39.8 
Tis 75 4 
Reliability Coefficient (split-half, after 

application of Spearman-Brown formula) 72 52 





of this try-out are briefly summarized in Table 3. In view of the small 
number of subjects, the #’s indicated fair validity, especially for the 
Office-Holder scale, but the reliability left something to be desired. 

The next procedure was to see whether, without sacrificing validity, 
the reliability could be increased by lengthening the test through in- 
cluding additional items of substantial, though lower, discriminating 
value. 


Construction of a 101-Item Scale for Office-Holders 
and an 84-Item Scale for Supervisors 


Determination of Item Weights. Responses of the Item-Weighting 
groups to the previously unused Bernreuter items were evaluated accord- 
ing to Kelley’s revised formula, by referring the comparative percentages 
of ‘‘Yes,” “No,” and “?” responses of leaders and non-leaders (Office- 
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Holders vs. Non-Office-Holders, and Supervisors vs. Non-Supervisors) 
to Strong’s chart, following the procedure by which the scoring weights 
had been derived for the 23-item Office-Holder and Supervisor scales. 
All items which received a weight of 1 or more according to the chart 
were included in the augmented scales. With this more lenient criterion 
for item-inclusion, the Office-Holder scale was expanded to a total of 101 
items, and ‘the Supervisor scale to 84 items, 74 items being common to 
both lists. When the lengthened Office-Holder test was applied to the 
57 Office-Holders and 70 Non-Office-Holders in the Item-Weighting group, 
t, based on the standard error of the difference between the means, was 
found to be 11.08. Comparison of this figure with that obtained from 
the 23-item test (¢ = 10.97) indicated that validity had not been reduced 
by inclusion of the additional items. 


Table 4 


Comparison of Scores of Office-Holders, Non-Office-Holders, and Non-Contrasting 
Subjects on 101-Item Office-Holder Scale 








Percent. 
N Mean SD Mean Diff. SEaier. t Overlap Trois 





Office-Holders 48 55.1 21.30 


OH-NOH 31.7 4.59 6.91 7.6 71 
Non-Office-Holders 56 23.4 25.14 


OH-NC 13.7 4.09 3.35 
Non-Contrasting 


Subjects 96 414 25.99 
NC-NOH 18.0 432 4.17 





Validity. The validity of the 101-item Office-Holder scale and of the 
84-item Supervisor scale was tested by application of the lengthened 
scales respectively to the Validation groups of 48 Office-Holders vs. 56 
Non-Office-Holders and 44 Supervisors vs. 45 Non-Supervisors. Tables 
4 and 5 give the results in terms of t, percentages of overlapping (per- 
centage of the non-leaders who exceed the median of the leaders), and 
biserial correlations. Some data are also included for comparing Non- 
Contrasting subjects with the criterion groups on each scale. The Office- 
Holder scale appears, as in the preliminary 23-item tests, to be considera- 
bly more discriminating than the Supervisor scale. In comparing the ¢’s 
for the shorter and longer forms of each test, one must bear in mind that 
the 23-item tests were applied to smaller samples of subjects. If the 
number of Validation subjects in the preliminary try-out is adjusted to 
equal the number in the later tests of validity, the é’s for the shorter 
tests are increased to 7.36 for the Office-Holder scale and 3.47 for the 
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Supervisor scale. It still appears, however, that lengthening the tests 
has not significantly lowered the validity. 

It is not surprising that the mean scores for Non-Contrasting subjects, 
while lying as they should between the means for the contrasting groups, 
are closer to the means for Office-Holders and Supervisors respectively 
than to Non-Office-Holders and Non-Supervisors. Non-Contrasting sub- 
jects were respondents who reported offices or persons under their super- 
vision, but to a lesser extent than the criterion subjects. The ¢ for the 
difference between the means of Non-Contrasting subjects and Non- 
Supervisors is significant at the .02 level, and borders on the .01 level. 


Table 5 


Comparison of Scores of Supervisors, Non-Supervisors, and Non-Contrasting 
Subjects on 84-Item Supervisor Scale 








Percent. 
N Mean SD 4 Mean Diff. SEaier. t Overlap ris 





Supervisors 44 32.7 16.46 


S-NS 153 3.84 3.98 19.7 .49 
Non-Supervisors 45 174 19.23 


SNC 5.1 349 1.46 
Non-Contrasting 
Subjects 36 27.6 14.32 


NC-NS 10.2 3.78 2.70 





Reliability. For computing the reliability of the lengthened scales, 
the four Validation groups were combined in a single list of 134 subjects. 
In order to avoid the possibility of spuriously increasing ‘the reliability 
coefficients by influence of the contrasting groups, 96 Non-Contrasting 
subjects were added to the list for computing reliability of the Office- 
Holder scale. Since the reliability coefficient for this list of 230 cases 
actually proved to be greater (by .01) than the coefficient for the 134 
subjects, it was considered unnecessary to score all the Non-Contrasting 
questionnaires on the Supervisor scale. The coefficients of reliability 
were computed by using the split-half method (Odd-Even) and applying 
the Spearman-Brown prophecy formula. For the Office-Holder scale 
(N = 230) the reliability coefficient is .81; for the Supervisor scale 
(N = 170) the coefficient is .69. 

Intercorrelations. A few intercorrelations are reported in Table 6. 
Correlations with Bernreuter scores have been computed only for the 
Office-Holder scale, since the reliability and validity of the Supervisor 
scale did not seem to justify the expenditure of time in scoring, tabulation, 
and computation. For correlation with the Office-Holder scores, F1-C 
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was selected because this scale gave the most significant difference (crit- 
cal ratio = 3.97) between Office-Holders and Non-Office-Holders in the 
previous study by Richardson and Hanawalt (5). BI-N and B3-I 
yielded critical ratios almost as great as F1-C, but have not been in- 
cluded in the present intercorrelations because of their high correlation 
with F1-C. B4-D was selected for correlation computation because 
along with ‘its value in discriminating between Office-Holders and Non- 
Office-Holders (critical ratio = 3.69) it carries a lower correlation than 
B1-N and B3-I with F1-C. 
Table 6 
Correlation of Office-Holder Scores with Other Scales 








N 





Office-Holder Scale and Supervisor Scale 
Office-Holder and FI-C 69 
Office-Holder and B4-D 69 





Discussion 


When the correlation between the Office-Holder scale and F1-C was 
found to be slightly greater than the reliability coefficient of the former 
scale, the first thought that came to mind was that of a certain king who 


“with ten thousand men, marched up the hill and down again.” Later 
reflection, however, suggested that mountains are frequently ascended 
for no other purpose than exploration, surveying, and mapping. Even 
if the conclusion is that F1-C (or B1-N) would serve the purpose of dis- 
criminating Office-Holders about as well as our Office-Holder scale, some 
exploratory value may be found in the study which we have carried 
through. 

In the first place, the possibility of constructing such leadership scales 
from the Bernreuter items was definitely an open question, suggested 
by the previous item-analysis by Hanawalt and Richardson (3). Asa 
matter of fact, the Office-Holder scale and the much less discriminating 
Supervisor scale actually have fulfilled our expectation that they would 
distinguish Office-Holders from Non-Office-Holders and Supervisors from 
Non-Supervisors better than any of the Bernreuter scales. The Office- 
Holder scale is considerably superior to any Bernreuter scale in this re- 
spect. For convenience in comparison, Tables 7 and 8 give the critical 
ratios (mean difference/SEai:.) from Richardson and Hanawalt’s earlier 
study (5) of the Bernreuter scales, along with the ¢’s from our Office- 
Holder and Supervisor scales. Reliability coefficients from the Bern- 
reuter Manual (2) and from the present study are also included. In 
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reliability, the Office-Holder scale is not far behind F1-C, and it is much 
more discriminating. At this point attention may be called again to the 
fact that the Validation groups of 48 Office-Holders, 56 Non-Office- 
Holders, 44 Supervisors, and 45 Non-Supervisors are entirely independent 
of the Item-Weighting groups from which the scale weights were derived. 
Application of the Office-Holder scale to the Item-Weighting groups of 
Office-Holders and Non-Office-Holders yielded a ¢ of 11.08. 


Table 7 


Office-Holders vs. Non-Office-Holders: Critical Ratios (Mean Difference/SEai. ) 
on Bernreuter and Office-Holder Scales. Reliability 











Coefficients of these Scales 
N Critical | N in Coefficient 
a Ratio Reliability of — 
Scale OH NOH ort Study Reliability 
BI1-N 57 116 —3.91 128 88 
B2-S 57 116 —0.18 128 85 
B3-I 57 116 —3.81 128 85 
B4-D 57 116 3.69 128 88 
F1-C 57 116 —3.97 100 86 
F2S 57 116 —2.33 100 .78 
Office-Holder 48 56 6.91 230 81 
Table 8 


Supervisors vs. Non-Supervisors: Critical Ratios (Mean Difference/SE air.) 
on Bernreuter and Supervisor Scales. Reliability 
Coefficients of these Scales 











N Critical N in Coefficient 
Ratio Reliability of 
Scale 8 NS ort Study Reliability 
B1-N 90 88 —2.94 
B2S 90 88 1.90 
B3-I 90 88 —2.11 
B4-D 90 88 2.72 See Table 7 
F1-C 90 88 —3.48 
F2-S 90 88 1.37 
Supervisor 44 45 3.98 170 .69 





The present study agrees with the earlier work by Hanawalt and 
Richardson (3) in showing throughout that the Office-Holders differ from 
the Non-Office-Holders more than the Supervisors differ from the Non- 
Supervisors. A larger number of discriminating items were found for the 
Office-Holder scale. The indices of validity (¢, biserial r, and percentage 
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of overlapping) show greater differentiation between Office-Holders and 
Non-Office-Holders than between Supervisors and Non-Supervisors. 
Supervisorship may actually be less tied up than office-holding with the 
personality pattern, and more subject to factors outside the person, or 
at least to factors outside the Bernreuter Inventory items. We cannot 
claim to have developed a highly reliable and valid device for selecting 
supervisors. We have shown, however, what can be done by way of 
developing from the Bernreuter Inventory a scale based on the items 
which best distinguish a group of Supervisors from Non-Supervisors. 
A study based on a group of supervisors of proved excellence might be 
more rewarding. Uhrbrock and Richardson (9) have reported a Test 
for the Selection of Supervisors in which they obtained a correlation of 
.71 + .03 between score on 85 significant test items and a criterion score 
based on ratings of supervisors by superintendents. These investigators 
included parts of the Thurstone Personality Schedule among the 820 
psychological and interest items which they tried out, but they do not 
indicate how useful they found the Thurstone items, which most closely 
resemble the material used in the present study. 

Objections to such questionnaires as the Bernreuter or ours for the 
purpose of candidate selection are too well-known to be repeated here. 
On the other hand, the data at least from our Office-Holder test are 
evidence of a certain validity in the method when responses are made 
seriously and in good faith by persons who have nothing to gain from 
making a certain showing. The Office-Holder scale might be useful in 
counseling a person who sought an appraisal of his attributes. 

Resemblances and differences between the characteristics of Super- 
visors and of Office-Holders are indicated by correspondence and differ- 
ence in the discriminating items for our two scales (74 out of the 84 
Supervisor items being significant also for Office-Holders) and have been 
discussed at length in the earlier paper by Hanawalt and Richardson (3). 
Item-analysis indicates that both types of leaders are characterized by 
dominance and good adjustment, but that Supervisors tend to be more 
self-sufficient than Office-Holders. A measure of community between 
the characteristics of the two groups is given in the correlation of .82 be- 
tween our Office-Holder and Supervisor scales. 

The possibility of increasing the reliability of a test by adding to its 
length has been demonstrated by the difference in the reliability coeffi- 
cients for our shorter and longer tests. That the increase fell short of 
what would have been predicted from the Spearman-Brown prophecy 
formula need not be taken to discredit the formula. The items added to 
lengthen the scales were not strictly comparable to the original items, 
being of lower discriminating value. If the additional items had met 
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this criterion, the Office-Holder test, expanded to 4.4 times its original 
length, should have increased its reliability from .72 to .92 instead of 
merely to .81; and the Supervisor test, augmented to 3.7 times as many 
items as the number in the shorter form, should have had its reliability 
raised from .52 to .80 instead of to .69. 


Summary 


1. By means of Kelley’s revised formula, with the aid of Strong’s 
weighting chart, scoring weights were derived for Bernreuter items to 
which responses of adult men who were leaders in vocational and social 
activities were significantly different from responses of non-leaders. 
Item weights for an Office-Holder scale were obtained from responses 
of 57 Office-Holders and 70 Non-Office-Holders; weights for a Supervisor 
scale were derived from responses of 90 Supervisors and 88 Non-Super- 
visors. 

2. Brief scales based respectively on the 23 items found by chi- 
square test to yield P values of .05 or less in an earlier study of the same 
groups (3) were applied to groups of new subjects. The 23-item Office- 
Holder scale yielded a ¢ of 5.63 for the difference between the means of 
30 Office-Holders and 30 Non-Office-Holders, with a reliability coefficient 
of .72. For the 23-item Supervisor scale, applied to 31 Supervisors and 
38 Non-Supervisors, ¢ was 3.06 and the reliability coefficient was .52. 
Biserial r’s and percentages of overlapping are also reported for each scale. 

3. By including all items which received a weight of 1 or more accord- 
ing to Strong’s chart, the Office-Holder scale was lengthened to 101 items, 
and the Supervisor scale to 84 items. Validity and reliability were tested 
by application of the scales to groups of subjects entirely separate from 
those used in the item-weighting, but including the cases used in testing 
the 23-item scales. For the Office-Holder scale, ¢ (based on means of 48 
Office-Holders and 56 Non-Office-Holders) was 6.91; biserial r was .71; 
percentage of overlapping 7.6; reliability coefficient .81. For the Super- 
visor scale, t (44 Supervisors, 45 Non-Supervisors) was 3.98; rvis .49; per- 
centage of overlapping 19.7; reliability coefficient .69. Lengthening the 
scales was thus found to have increased their reliability without lessening 
their validity. 

4. It is concluded that the factors which determine supervisorship are 
less clearly defined by items in the Bernreuter Inventory than the factors 
which determine office-holding. The 101-item Office-Holder test might 
be found useful for guidance though not for candidate selection. It dis- 
criminates between Office-Holders and Non-Office-Holders considerably 
better than any of the Bernreuter scales. The correlation between the 
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Office-Holder scale and F1-C, however, is slightly greater than the re- 
liability coefficient of the former. 


Received October 2, 1947. 
References 


. Bernreuter, R. G. The theory and construction of the Personality Inventory. 
J. soc. Psychol., 1933, 4, 387-405. 

. Bernreuter, R.G. Manual for the personality inventory. Stanford Univ.: Stanford 
Univ. Press, 1935. Pp. 6. 

. Hanawalt, N. G., and Richardson, H. M. Leadership as related to the Bernreuter 
personality measures: IV. An item analysis of responses of adult leaders and 
non-leaders. J. appl. Psychol., 1944, 28, 397-411. 

. Kelley, T. L. The scoring of alternative responses with reference to some criterion. 
J. educ. Psychol., 1934, 25, 504-510. 

. Richardson, H. M., and Hanawalt, N. G. Leadership as related to the Bernreuter 
personality measures: III. Leadership among adult men in vocational and social 
activities. J. appl. Psychol., 1944, 28, 308-317. 

. Strong, E. K., Jr. Chart for the computation of weights for interest test items. 
Stanford Univ., 1935. (Photostatic copy available from author.) 

. Strong, E. K., Jr. Vocational interests of men and women. Stanford Univ.: Stan- 
ford Univ. Press, 1943. Pp. xxix + 746. 

. Strong, E. K., Jr., and Carter,H.D. Efficiency plus economy in scoring an interest 
test. J. educ. Psychol., 1935, 26, 579-586. 

. Uhrbrock, R. 8., and Richardson, M. W. Item analysis: the basis for constructing 
a test for forecasting supervisory ability. Person. J., 1933, 12, 141-154. 


10. U.S. Bureau of the Census: Sixteenth Census of the United States, 1940. Population: 
Characteristics of the population of New Jersey, 2nd Series. Washington: U. 8. 
Gov. Printing Office, 1942. 








{ 
} 


re on oes 


ee 

















Identification of Cola Beverages, I. First Study. 


N. H. Pronko and J. W. Bowles, Jr. 
University of Wichita 


When subjects are asked to taste and identify four samples of cola 
beverages, only three of which are generally known, what pattern of iden- 
tifications will be observed in the situation? Will the Ss apply four 
different naming categories or will they tend to repeat one of the pre- 
viously used brand names? What relationships will appear between (a) 
the Ss’ judgments when given four different cola drinks to identify as 
compared with (b) Ss’ judgments when the same cola drink is presented 
four times? If (a) and (b) should be significantly different, then appar- 
ently S is making a taste discrimination on the basis of the actual chemical 
and physical properties of the stimulus objects. On the other hand, if 
(a) and (b) should be essentially similar in their patterning, then Ss’ 
judgments must be explained otherwise. The present experiment was 
designed to yield an answer to these related problems. 


Procedure 


The subjects of the present investigation consisted of 168 college 
students, for the most part members of the General Psychology courses. 
There were 117 males and 51 females. . 

Part I. Each of 108 Ss was admitted singly into the experimental room 
and was invited to sit down at a small table containing a tray with four 
1 oz. samples of cola beverages. The following instructions were then 
read to him. 

We would like to have you taste and identify some cola drinks. You will 
be told in what order and when you are to drink them. After you have finished 
each sample, report your identification to Z and, by referring to the scale placed 
before you, indicate the particular degree of certainty that applies to each 
judgment. (This printed scale placed in front of the subject employed the 
following four steps: (1) very certain; (2) moderately certain; (3) moderately 
uncertain; (4) very uncertain.) After each stimulus presentation, take enough 
water from the paper cup before you to rinse your mouth well. 


From the tray placed before him, S picked up and drank in a certain 
order the contents of a one oz. glass labelled w, x, y, and z symbolizing, 
respectively, Coca Cola, Pepsi Cola, RC (Royal Crown) Cola, and Vess 
Cola. After each drink the S’s identification and degree of certainty of 
judgment were immediately recorded. His name and other pertinent 
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information were also obtained and recorded between drinks, which were 
spaced approximately a minute apart. At the conclusion of the experi- 
ment, S was asked to say nothing about the experiment except that it 
was a test of taste discrimination, although it should be emphasized that 
at no time could S see any part of the preparation of the beverages or get 
any cues that might indicate the nature of the experiment. At all times, 
all bottles,,etc. were kept out of sight behind screens. The beverages 
were kept iced until used so that: whatever temperature variation may 
have obtained was slight and was constant for all four beverages. The 
order of presentation of the four oeverages, pre-determined, was such 
that each of the four stimuli appedred four times in each of the first, 
second, third and fourth positions, foreward and backward. This 
counter-balanced order was used to preclude the operation of effects re- 
sulting from position of stimuli or from stimulus interactions in the mouth. 

Pari II. In Part II, 60 Ss were administered the same cola drink at 
each of the four trials, each group being evenly divided with respect to 
the four cola brands. Thus, 15 Ss were given all Coca Cola; 15, all Pepsi 
Cola; 15, all RC Cola; and 15, all Vess Cola. In all other respects, the 
procedure was the same as in Part I. 


Results and Discussion 


Inspection of Table 1 will show that, primarily, our group of 108 Ss 
used three categories of identification or naming response with a slight 
sprinkling of less well-known or less advertised product names. When 
all four brands are considered, the total Coca Cola, Pepsi Cola, RC Cola 
and Vess Cola judgments employed are respectively 132, 147, 112, and 4. 


Table 1 


Showing the Distribution of 432 Identification Responses When Each of the 108 Ss 
was Presented in Turn, but in Counter-balanced Order, with a 1 oz. 
Sample of Coca Cola, Pepsi Cola, RC Cola and Vess Cola 
Note: In this part of the experiment, Part I, each S was given four different brands 
of cola drink. 








Frequency of Ss’ Various Identification Responses 








Brand of 

Beve Dr. 

Given C.C. Pep. R.C. Vess. Cleo Rock Pep. Other D.K. Total 
Coca Cola 46 34 21 1 0 0 1 0 5 108 
PepsiCola 25 50 23 1 Fe ae ee” 1 108 
RC Cola 32 29 39 1 3 0 2 1 1 108 
Vess Cola 29 34 29 1 4 1 3 2 5 108 
Total 132 147 112 4 10 2 6 7 12 432 
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Table la 
Identification of Cola Beverages by 108 Ss When Each S was Presented a 
Sample of Each of Four Brands 
Brand of Cola Presented 
Coca Cola Pepsi Cola RC Cola Vess Cola Totals 

Identifi- 

cation ‘ No. % No. % No. % No. % No. % 
Correct 46 43 50 486 46 39 486 36 1 1 136 86 3 
Incorrect 62 57 58 54 69 64 107 99 296 69 
Totals 108 100 108 100 108 100 108 100 432 100 





The last brand is clearly out of line with the rest, for out of 108 presenta- 
tions the Vess response shows up four times, only once correctly. This 
last brand is also misidentified as Dr. Pepper three times and as Cleo Cola 
four times; but more important still is the finding that it was misidentified 
as Coca Cola 29 times, as Pepsi Cola 34 times, and as RC Cola 29 times. 
Why this particular pattern of namings? 

The data for the other brands presented are also significant. Note 
that RC Cola is correctly named 39 times, yet is misidentified as Coca 
Cola 32 times and as Pepsi Cola 29 times. Pepsi Cola is “correctly iden- 
tified’ 50 times but it is incorrectly identified almost as many times (48) 
with Coca Cola and RC Cola combined. Although Coca Cola is ‘“‘cor- 
rectly identified” as Coca Cola 46 times, it is misidentified as Pepsi Cola 
or RC Cola a total of 55 times. 

From a slightly different approach Table la compares paneenhnane of 


Table 2 


Showing the Distribution of 240 Identification Responses When Each of 60 Ss was 
Presented with Four One-oz. Glasses of Either Coca Cola, 
Pepsi Cola, Royal Crown or Vess Cola 
Note: In this section of the experiment, Part II, each subject received four samples 
of the same beverage. 








Frequency of Identification Responses 





Brand of 
Beverage Dr. 
Presented C.C. Pepsi R.C. Vess Cleo Rock Pep. Other D.K. Total 





CocaCola 25 21 6 
Pepsi Cola 27 21 10 
RC Cola 19 2 14 
Vess Cola 18 18 16 


Totals 89 80 46 
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correct with incorrect identifications for each of the four beverages. In 
terms of total identifications, it will be noted that only 31% are correct 
while 69% are misidentified. 

Results for our second group of 60 Ss in Part II, where each group of 
15 Ss was given four samples of the same cola beverage, are not essentially 
different from the picture obtained from the 108 Ss of Part I that got four 
different drinks. The total number of Coca Cola responses is 89; Pepsi, 
80; RC, 46; Vess, 8 and a scattering of others as shown in Table 2. Note 
that although 25 identification responses for Coca Cola are correct, never- 
theless, this same label is misapplied almost as many times (19) incor- 
rectly to RC and more times (27) incorrectly to Pepsi Cola. As a matter 
of fact, one S identified Coca Cola as Seven Up. As for Pepsi Cola, al- 
though it is correctly identified 21 times, it is more often (27 times) mis- 
identified as Coca Cola. In line with our hypothesis, RC Cola is iden- 


Table 2a 


Identification of Cola Beverages by 60 Ss When Each S was Presented 
Four Samples of Same Brand 








Brands of Cola Presented 
Pepsi Cola R.C. Cola Vess Cola 
Identifi- —_——_-— — 
cation No % No &% No &% No. &% No. % 


Correct 42 21 35 14 23 3 5 63 26 
Incorrect 58 39 65 46 77 57 95 74 


Totals 60 100 60 =100 60 100 60 100 240 100 











25 
35 





tified a fewer number of times than the other two drinks (here 14) and a 
greater number of times as Coca Cola (19) and Pepsi Cola (20). 

Again, in line with our suppositions, although Vess Cola is “correctly 
identified” three times, it is misidentified six times as frequently (18 times) 
as Pepsi Cola, and on 16 occasions as RC. These findings add further 
support to our hypothesis that the identification response is not a func- 
tion of the physico-chemical properties of the stimulus objects but a mat- 
ter of using an available verbal tag or label for it. The best evidence for 
such an assertion was obtained during the course of the experiment from 
the Ss’ spontaneous remarks. After having judged the second or third 
sample, 8 would frequently make a remark similar to the following culled 
from our protocols: “Let’s see, what is the other Cola?’’; “I’m only ac- 
quainted with three colas’”’; and “I can’t even think of any others.” 

In terms of percentages of correct responses, Table 2a will show a 
close approximation to those of Table la of Part I. Only 26% of identi- 
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fications are correct here where each S gets four samples of the same 
beverage as compared with 31% correct where four different beverages 
are sampled. 

Thus, when four cola drinks are presented, one of which is a “‘dark 
horse,” the pattern of naming appears to be in terms of the names of the 
better known brands rather than the actual beverages used. It sug- 
gests that perhaps the Ss’ identification responses are a function of famili- 
arity of the verbal label as met with in their reactional biographies, 
through contact with the actual beverages, advertising posters, etc. 

If this latter hypothesis is correct, then the frequency of naming re- 
sponses for each of the four stimuli employed should show a patterning 
related to a familiarity with the names of the products and not to an 
actual taste discrimination and identification of the four stimuli. For 
example, the distribution of the 132 Coca Cola identification responses 
when four different stimuli were actually used should be distributed by 
chance. And so for the others. Actually, significance ratios applied to 
the difference between the frequency of our Ss’ identification responses 
and the expected frequency (Table 3) substantiate our premise, since the 
ratios here are consistently low, in one case only reaching as high as 2.13. 
Apparently, then, our Ss did not actually discriminate the four stimuli 
but merely applied a name from an available repertoire. When a third 
or fourth name was not readily recalled for identifying the third or 
fourth stimulus, Ss tended to repeat one of those previously used. 

In our opinion, such evidence indicates that the four beverages were 
not differentiated on gustatory grounds. Instead, S tried to think of a 
number of different names to apply to the four stimuli whether they were 
different as in Part I or the same as in Part II. Note Table 4 which 
presents significance ratios of the preceding analysis. In 14 cases the 
ratios are considerably below 1; in two cases only are they above 1, these 
being 1.50 and 1.06. Additional support is furnished in Table 5 which 
indicates no statistical significance in the differences of the naming re- 
sponse patterns when Ss get four different colas or four identical colas. 

Should further argument be needed for the validity of our inter- 
pretation, the reader is referred to the data presented in Table 6, which 
shows the correct identifications as well as mis-identifications of those of 
our Ss who expressed definite likes or dislikes for certain of the Cola 
drinks. Column 2 indicates the number of Ss that preferred or dis- 
liked the cola beverages listed in Column 1. Column 3 shows the number 
of the same Ss who correctly identified those beverages. But Column 4 
indicates that from one-third to over two-thirds of those same Ss applied 
the same name to one or more of the other cola drinks than the preferred 
or dislik-1one. For example, of the 61 Ss who expressed a preference for 





90°T ¥60° 08 990° c 18° 290° 
¢¢° 160° , 00° £90° F £9 £90° 
ce 880° i 890° , gL’ 290° 
Os'T 080° ST 890° j cr 990° 





O18 “HIPo “yt Oney “WiPo “yl OT} 83] “WIPO 
8190 ‘O'U SV BOD Isdeg sy BOD B0D SV 
pegryuepy MOF 























NU 0488], [eNIOY Jo sis¥g OY} UO JON OIB SOBBIOAOG BOD INO oy} 04 
(qyueunsedxy jo [J] 318g) sesuodsey uoTwOYyNUEep] SsNolmeA OY} Jo UOTNQIIySIC] OY} 4Vy} SIseyjOdAT 9y} Jo 8}s0], OBI GOUBOyTUBIG 


F QR, 





8st LS0° j Ir 6F0° j 6¢° TSO" 
og'T 090° , 8'T LO j 02° 1g0° 
82° TSO" ° 3 ed Lt0 F oT 60° 
ert £90" ; Iv 640° c0° POT ¢so 





~ 
3 
s 
S 
FS 
8 
S 
> 
: 
: 


onsy Po “BI onsy Po“ oney = MPa 
8199 ‘O'U SV BOD Isdeg sy BION BOD SV 
pegnuep] MoH 























INUIT, 9488], [eNOY Jo sis¥g OY} UO JON Or SOBvIOACG BjOD INOJ 9y} 0} 
(yuounsedxg jo | Wz¥q) sesuodsey uoNwoyNUepy SNoLIBA oy} JO TONNGLYSICT O43 4vY} sIseyoOdAH OY} Jo 81897, ONBY SoUBOTUBIG 


£ A98L 





ee rl ioe @&lhlkdHtwuowd & 


310 N. H. Pronko and J. W. Bowles, Jr. 


Table 5 


Significance Tests to Determine Whether Differences between Percentages in 
Results of Part I and Part II are Real 














Brands of Cola Presented 








| Coca Pepsi RC Ves 
Statistic Cola Cola Cola Cola Totals 
P; (correctly identified 
in Part I) 43% 46% 36% 1% 31% 
P; (correctly identified 
in Part IT) 42% 35% 23% 5% 26% 
| P; — P; 1% 11% 13% 4% 5% 
4 SEaiet. 3.04 28.37 25.09 2.69 6.30 
Significance Ratios .33 .39 52 1.49 .79 





| Coca Cola, 28 “‘correctly identified” it but 19 of the 28 also gave the same 
naming response one or more times to either Pepsi Cola, RC Cola or Vess 
Cola. Of the two Ss who express a preference for RC, neither identifies 
this beverage correctly. Other similar examples may be seen in Table 6. 


Table 6 


Showing Number of Ss (in Part I of the Experiment) Correctly Identifying and 
Mis-identifying Their Preferred or Disliked Brand of Cola 























No. of Same 
“Correctly” 
Identifyin: 
Preference o 
No. of ‘No. of Also Mis-identified 
Beve Ss Same Ss One or More Other 
Repo as Preferring “Correctly” Brands as the One 
Preferred Beverage Identifying Preferred 
Coca Cola 61 28 19 
' Pepsi Cola 31 19 7 
RC Cola 2 0 _ 
Vess Cola 0 —- a 
No. of Same 
“Correctly” 
Identifyi 
Dislike O 
No. of No. of Also Mis-identified 
i mss Ss Same Ss One or More Other 
; Peper i Disliking “Correctly” | Brands as the One 
cc isli Beverage Identifying Disliked 
Coca Cola 6 5 2 
Pepsi Cola 10 3 2 
RC Cola 19 5 3 


Vess Cola 0 
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One last point needs to be made. An examination of correct responses 
in Table 1 could possibly be interpreted as showing a trend toward correct 
identification but we believe such a “‘trend’’ to be spurious and a mere 
function of applying the most familiar brand names. This is indicated 
in Table 2 both by the infrequency of names of cola drinks not used in the 
experiment as well as infrequency of correct identification of our fourth 
cola brand. Note that Cleo Cola is employed as a label 10 times as com- 
pared with the name of the actual beverage (Vess Cola) which shows up 
only four times. Again, Vess Cola in both Parts I and II rather than 
being identified as an unknown brand is misidentified as one of the three 
popular drinks. Furthermore, we have already called attention to the 
frequent “searching around” for a third or fourth name on the part of S, 
when he did not readily recall a variety of Cola names. 

In addition, it will be recalled that the data of Table 6 revealed a 
frequent repetition of the names of the correctly identified popular Colas 
to misidentification of the lesser known. Furthermore, we must stress 
the fact that there is no “trend” or clustering of correct responses in Part 
II of the experiment as shown in Tables 2 and 2a. Although Vess Cola 
is presented 60 times here it is correctly identified only three times but 
is misidentified a total of 54 times as Coca Cola, Pepsi Cola, or RC Cola. 
It is failure on the part of S to use this naming response that spuriously 
overloads the cells of the other three brand names. Finally, statistical 
tests of the distribution of naming responses for each of the four brands 
do not justify rejection of the null hypothesis. In conclusion, while the 
writers are of the opinion that the foregoing evidence is sufficeint to estab- 
lish their interpretation of the facts, further investigation is necessary to 
corroborate these findings. 


Summary 


A group of 108 Ss was asked to identify one oz. samples of each of the 
following four Colas: Coca Cola, Pepsi Cola, RC or Royal Crown Cola, 
and Vess Cola, presented in counter-balanced order. Ss’ identification 
responses (Total = 432) were distributed as follows: Coca Cola, 132; 
Pepsi Cola, 147; RC Cola, 112; Vess Cola, 4; Cleo Cola, 10; Rock Cola, 
2; Dr. Pepper, 6; Other, 7; Don’t Know, 12. 


1. There was a marked tendency to use only the three most familiar 
categories in the naming response so that the fourth brand was misiden- 
tified for the most part as one or the other of the three familiar colas. 
This was also true for the other three brands. 

2. Another group of 60 Ss, distributed into four subgroups of 15 Ss, 
each group being presented with four one oz. samples of the same beverage 
during the four successive trials, showed a similar tendency in its verbal 
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responses. Misidentification of Vess Cola with the other drinks was 
most frequent but the popular brands were also commonly misidentified 
with one another. 

3. Both groups also showed a slight frequency of misidentification of 
of the four brands with lesser known colas and other soft drinks not used 
in the experiment. : 

4. Although Ss who preferred or disliked a Cola drink did sometimes 
identify it correctly, nevertheless, from one-third to over two-thirds as 
often they also called one or more of the other cola drinks by the same 
name. 

5. It was noted that after the second or third taste judgment, Ss 
frequently and spontaneously remarked that they could not recall any 
other cola names and would repeat a previously used brand name. 

6. Statistical tests of the distribution of naming responses for each of 
the four brands further support the view that our Ss did not discriminate 
the four brands on a gustatory basis but rather applied a readily available 
repertoire of cola-naming responses. 


Received October 16, 1947. 


The Reliability of Job Evaluation Rankings * 


Philip Ash 
The Pennsylvania State College 


While a number of papers have been published recently concerning 
the validity and internal consistency of job evaluation systems (1), (3), 
(4), (7), little systematic study seems to have been made of the reliability 
of job evaluation ratings.' Rather, on the one hand reliability has been 
assured (or assumed) by the rating methods used, or on the other hand the 
limitations demonstrated in connection with personality and other trait 
ratings have been imputed to job evaluation ratings (5), (8). 

The writer believes that a great deal of merit inheres in both of these 
positions. In many companies using formal job evaluation systems job 
evaluation ratings are assigned by a process of group discussion and 
compromise. This practice reduces considerably the importance of in- 
dependent rater-to-rater consistency. It is also true that a priori analysis 
would reveal a great similarity between the process of assigning a point 
rating to a job with respect to, say, “physical effort requirements,” and 
the process of assigning a rating to an individual with respect to, say, 
“sociability.”” To that extent the results of study of the latter type of 
rating judgment apply equally to the former. 

The problem vis-a-vis job evaluation ratings still warrants independent 
investigation, however. In large companies it is common to find jobs 
“reallocated” and new jobs rated by a single analyst, for whose judgment 
there is no measure of reliability. Furthermore, there is good reason to 
believe that the analogy of personality trait ratings does not cover all 
salient points. 

This paper reports a single study of the reliability of rating judgments 
made by trained job analysts. It is believed that some of the results 
presented here will be applicable to any job evaluation installation, and 
will suggest directions for future investigation. 


* Grateful acknowledgment is made for permission to use data from the files of the 
Division of Occupational Analysis. Responsibility for all opinions and conclusions 
contained herein, however, is solely that of the writer. 

1 Since this paper was written, a study very similar in design has been published by 
Lawshe, C. H., and Wilson, R. F. Studies in job evaluation. -6. The reliability of two 
point-rating systems. J. appl. Psychol., 1947, 31, 355-365. 
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Description of the Project 


One of the products of occupational research conducted by the Divi- 
sion of Occupational Analysis is Part IV of the Dictionary of Occupational 
Titles (9). This volume offers a classification of occupations into homo- 

ye geneous groups, “families” or “fields of work.” In each group the oc- 
ei cupations are related on the basis of similarities in knowledges, skills, 

ia aptitude requirements, and other functional criteria. However, in pre- 
paring a revision of the volume it was decided to order the occupations 
within each group in terms of skill level. The skill level hierarchy could 
be used to indicate points of entry, promotional sequences, and channels 
of advancement. To accomplish this skill-level ordering, a point-rating 
job evaluation system was devised. Since it would not have been feasible 
to hold group rating discussions for each of the twenty-three thousand 
occupations to be included, some estimate of the probable reliability of 
individual analyst ratings was needed. A pilot study was therefore de- 
vised for this purpose. 

Five questions were posed for investigation: 





1. How reliably (i.e., does the average analyst rate jobs? rater-to- 
rater consistency), 

2. Do differences in consistency of rating appear between the various 
job evaluation factors? 

3. Are there any factors for which particular jobs cannot be rated con- 
sistently, even though the over-all consistency of rating on these factors 
is high? 

4. Are there any jobs which cannot be rated consistently, due to lack 
of information or for other reasons? 

5. To what extent do the factors overlap? 


ae SE Ate 


¢ The Job Evaluation System. A study was made of twenty-two of the 
most ‘widely-used job evaluation plans. Nine factors, including what 
appeared to be all relevant components of the skill level of a job, were 
selected and defined. The factors included knowledge, physical skills, 
adaptability and resourcefulness, responsibility for material goods, respon- 
sibility for safety, responsibility for the work of others, physical effort, atten- 
tion, and working conditions. The definitions were given in considerable 
detail. For example, the definition for the factor physical skills read: 
“The dexterities, coordinations, and muscular discriminations required 
re for successful manipulation of materials, tools, machines, or equipment to 

i effect successful job performance. Evaluate developed physical skills 
a involving any one or combination of: dexterity of fingers or other mem- 
a bers, coordination of senses, hands and/or feet. Consider the complexity 
Ht of necessary movements and the frequency and speed demanded; degree 
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of coordination required between sensory cues and movement responses; 
accuracy or precision of movements or muscular judgments required; 
repetitiveness of movements; independence of finger, hand, foot, and/or 
leg movements.” 

The Job Sample. A sample of twenty-seven occupational descriptions 
was selected for the purposes of this study. Each description was a 
composité summary of information collected in from two to twenty-five 
independent job analyses. A description therefore constituted a detailed 
statement of the characteristics typically associated with the occupation, 
together with an indication of deviations from this average in the plants 
in which the basic job analyses were prepared. Each description included 
a statement of the duties performed in the occupation, machines, tools, 
and other work-aids used, working conditions and hazards involved, 
hiring requirements (sex, education, previous experience), promotional 
lines, and estimated worker characteristics (degrees of strength, dexterity, 
intellectual ability, and so forth) required for successful performance. 

It should perhaps be pointed out that the findings in this study relate 
to the reliability of ratings of the job descriptions. No analysis was made 
of the objectivity, reliability, or validity of these descriptions vis-a-vis 
the actual jobs. 

This is a problem common to all job evaluation programs, and one 
which might well merit study. What is usually subjected to rating is a 
brief abstract from the mass of data that constitutes a job. It may well 
be that points decisive for a fair skill level evaluation are overlooked, that 
biases and deficiences in the original observations seriously distort rating 
judgments. 

It may be pointed out, however, that in view of the fact that the 
Division has had over thirteen years’ experience in developing the tech- 
niques of job analysis, and has established methods that yield consistent 
agreement among job analysts with respect to the characteristics of jobs 
studied, there is a presumption in favor of the validity of the descriptions 
used here. 

The specific jobs included in the sample were: bookkeeper, bootblack, 
cabinetmaker, clerical checker, chef, ditch digger, deep-sea diver, spinning 
doffer, heating and ventilating equipment draftsman, garment factory 
foreman, gardener, hat designer, machine oiler, metal machinist, olive 
packer, paper cutter (machine), physician (on a ship), plasterer, bull- 
ladle pourer, president of a refrigerator manufacturing company, punch- 
press operator, housefurnishings salesman (retail), petroleum stillman, 
hand trucker, typist, nigat watchman, window calker. This range of 
jobs is considerably greater than that found in a typical plant. It is 
probably only a very restricted sample of the jobs in which the USES 
is interested. 
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The Analysts. Ten job analysts participated in the study. All of 
them had had considerable experience in job analysis, occupational classi- 
fication, and related phases of occupational research. The range of ex- 
perience was from two to twelve years; the mean for the group was 4.8 
years’ experience. In addition, a training session was held to discuss the 
purposes of the study, to review the factor definitions, and to ensure that 
uniform procedures would be followed. 

Procedures. Each analyst’ was provided with a complete set of the 
descriptions and the factor definitions. Since no point-values had been 
established, the analysts were instructed to rank the sample for each 
factor, treating each factor independently to reduce any halo effect which 
might operate. The job ranking lowest was assigned the number “1,” 
the job which ranked highest was assigned the number “27.”” As neces- 
sary, adjustments were made for ties. 


Statistical Findings 


Reliability of the Raters. For each job, the median of the analysts’ 
ranks for each factor was determined. The rankings of each analyst were 
correlated with the median array for each factor. Using the resulting 
correlation matrix a median coefficient was determined for each factor 
and for each analyst. 

In addition, for each factor an average intercorrelation coefficient was 
calculated. These coefficients are given in Table 1. 

Analysis of Table 1 suggests that, given trained analysts, a very high 
degree of consistency in job evaluation ratings may be obtained. The 
range of the reliability of the analysts, expressed in terms of the median of 
the correlations made by the analyst on the nine factors, is from .81 to .94. 

Furthermore, of the ninety coefficients giving the correlation of the 
rankings of an analyst on a factor with the median rankings for the factor, 
forty-nine exceed .90, twenty-six are in the range .80 through .89, twelve 
are in the range .70 through .79, and only one is very low (.25). This 
last is the only coefficient which might have been obtained merely on the 
basis of chance expectancy. 

Differences in Consistency with Respect to the Factors. The reliability 
of the differences among the various correlation coefficients was not cal- 
culated. Examination of the quantities in Table 1 will indicate the 
general picture, however. 

The average intercorrelations are perhaps the most pertinent indices 
with respect to factor consistency. As is to be expected, they are some- 
what lower than the mdeians of the individual correlations for each factor. 
The average intercorrelations suggest the probable magnitude of the re- 
liability of a single ranking. They are therefore more revealing in re- 
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lation to the question of independent rater-to-rater consistency. It 
should be noted that, while these coefficients all exceed chance expectancy 
by a wide margin, they range in magnitude from .39 (“attention”) to 
.93 (“adaptability’’). 

It is pertinent to observe, with respect to the factor “attention,” how- 
ever, that the coefficient reflects the extreme disagreements of analyst C. 
If analyst C’s rankings were dropped, the average intercorrelation coeffi- 
cient would be raised to .59, which would be comparable to those for 
“physicai effort” and “responsibility for the work of others.” 

As a guide to action, it seems reasonable to conclude that. the job in- 
formation available and the understanding of the definitions were satis- 
factory for the factors “knowledge,” “adaptability,” and ‘working 
conditions.”” On the other hand, improvement could be sought in the re- 
maining six factors. 

This is borne out by an analysis of the range of ranks assigned each 
job for each factor. The average rank-range for “‘knowledge,” “‘adapta- 
bility”, and ‘“‘working conditions” was 6.6 ranks, 7.7 ranks, and 8.6 ranks 
respectively. On the other hand, the average rank-range for “attention” 
was 15.4 ranks, for “physical skills” it was 11.5 ranks, and for “respon- 
sibility for materials” it was 11.4 ranks. ~The spread of rankings for a 
job on a factor provides a rough index of the consistency with which the 
job is rated on the factor. 

Consistency of Ratings of Jobs. Paterson (6) has suggested that it is 
questionable whether all job evaluation factors are equally applicable to 
all jobs to which they are applied. In an effort to determine whether any 
of the factors were inapplicable to any jobs, or whether any of the jobs in 
the sample were particularly difficult to rank with respect to particular 
factors, the analysts were asked to report their reactions in this respect. 

The analysts were uniformly of the opinion that all the factors were 
applicable to every job, if only to establish a point of the ranking con- 
tinuum. For example, “responsibility for the work of others,” which 
may be interpreted broadly as supervisory responsibility, was deemed 
applicable to clearly non-supervisory jobs. The factor provided a basis 
for making a skill level discrimination in this area. The same comment 
is applicable to any of the factors, on the proposition that absence of the 
factor from the job is itself pertinent to evaluation of the job. 

With respect to the particular jobs included in the sample, however, 
the question of applicability frequently became one of adequacy of infor- 
mation. For four jobs the rank-range on each factor was wide. The 
analysts reported that these were jobs for which the information was too 
ambiguous or too scanty to make a ranking judgment with confidence. 
It was also found that, for particular jobs, personal biases of one kind 
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or another operated in one or more of the factors. These led to inter- 
pretations at variance with the interpretations placed either on the job or 
the factor by the group asa whole. For example, should the Physician’s 
performance of operations be ranked low on “working conditions’? To 
what extent is a worker responsible for the safety of others when he works 
in high places and may drop his tools on another worker? 

The information collected served as a basis for reconsideration of 
factor definitions, and pointed to the need for trial runs to ensure that 
factor definitions adequately cover all variations of application of the 
factor. 

Factor Overlap. In view of the small size of the sample and the defi- 
ciencies revealed in the data and the factor definitions, only a hasty study 
was made of factor overlap. The correlations between each pair of me- 
dian rankings was calculated. These correlations are given in Table 2. 

It is obvious from Table 2 that the factors do not represent inde- 


Table 2 


Intercorrelations Among Nine Job Evaluation Factors 
Note: 27 jobs, correlations of median ranks of 10 analysts. 








Factors 1 2 3 4 5 6 7 





. Knowledge — 
. Adaptability 94 — 
. Physical Skills 39 8=6..82 — 
. Responsibility for 
Materials 71 . ae a 
. Responsibility for 
Safety 21 28 .36 38 — 
. Responsibility for 
Work of Others .90 76 19 .72 44 ~- 
. Physical Effort —.27 —.39 23 —.28 51 —.12 — 
. Attention 81 74 40 .67 43 71 —.30 _ 
. Working Conditions -—.29 -—.08 .19 —.10 .82 003 .84 —.12 





pendent variables. In fact, some of them seem to be almost identical 
(“knowledge” and “adaptability’’). 


Summary and Conclusions 


In a pilot study designed to determine the reliability of job evaluation 
ratings made by trained analysts, ten analysts ranked twenty-seven jobs 
on nine factors. The following conclusions seem justifiable: 


1. That in general a high degree of reliability of analyst judgment may 
be anticipated. 
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2. That consistency of rating appears to be in part at least a function 
of the factor rated, and in part a function of the job information available. 
It would therefore seem desirable to determine analyst reliability for each 





factor independently, and to make adjustments accordingly. 
BS 3. That in a set of as many as niae factors it is probable that at least 
i some of the factors overlap and may be dispensed with. However, it 


may be that the elimination of one of a pair of correlated factors will alter 
appreciably the interpretation of the remaining factor, which will absorb 
some of the implications of the deleted factor. It would seem desirable to 
discover whether the overall order of jobs does in fact remain the same 
when such eliminations are made. 


abe. eee pire 


The writer feels that certain qualifications of these results should be 
1 noted. In the first place, the reliability noted relates to the content of 
job descriptions; whether the rankings are valid for the jobs themselves re- 
mains to be determined. In the second place, the great variability ot 
jobs in this sample, as compared with the array of jobs usually found in a 
single plant, probably tended to increase the reliability coefficients. 
Finally, since ranks were used in this study, rather than a longer point- 
rating continuum, a slight loss in comparability with the usual job evalua- 
tion plan has possibly resulted. 


Received July 14, 1947. 
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Special Review: Psychology in an Ideal University * 


Walter V. Bingham 
1661 Crescent Place, Washington 9, D. C. 


The place of psychology in an ideal university is the theme of a little 
book which psychologists are reading with relish because it portrays the 
swift flowering of their young science into a cluster of technologies, and 
points to prevailing academic shortcomings which will be corrected only 
after recognition of these developments. 

This fascinating brochure is the report of a University Commission 
appointed by President Conant in the spring of 1945 to advise on the 
future of psychology at Harvard. He had asked the Commission ques- 
tions like these: 

“What different types of professional psychologists should be trained by 
the University and in what faculties? What coordination, if any, should we 
attempt between the work of the psychologists and the Psychiatrists in the 
different faculties? What instruction in Psychology is a desirable part of the 
general education of undergraduates or students in our various professional 
schools? Should we recognize the different types of Psychology by suitable 
labels on our professorships and separate methods of reviewing permanent 


appoin*ments, or should we attempt to have an over-all committee on Psy- 
chology which would be concerned with new appointments in the whole field?” 


Under the chairmanship of the Rockefeller Foundation’s Director for 
the Medical Sciences, the Commission consisted of six eminent psycholo- 
gists and of'six equally eminent representatives of related fields, four of 
them connected with medicine. 

The report is in three parts. First the reader is reminded of the 
nature and the vastly expanded range of psychology; its methods, both 
traditional and novel; its unrealized potentialities for the whole field of 
education; and the strength and cogency of the appeal made to the general 
public as well as to students by this science and by its developing psycho- 
technologies. From this settion a reviewer is tempted to quote several 
pungent paragraphs like this: 

“Most of the current reports and policy statements on education show an 
almost exclusive concern with curricula and their presumed effects. No doubt 
well meant, this ignorance of the realities produces a strangely impractical and 


doctrinaire effect. Actual experience in a dean’s office (or in the college 
physician’s office) reveals quite another set of considerations—namely, the 


* Alan Gregg and others. The place of psychology in an ideal university. Cambridge: 
Harvard University Press, 1947. Pp. x+42. $1.50. 
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psychological adjustments of individual human beings, of teachers as well as 
students, to the processes aimed at in the curricula. Do not the motivations 
and personality of the student matter in education? Are we wise or even safe 
in rleattion as negligible or unmanageable the varieties of student capacities, 
tastes, and temperaments? Does not this aspect of education in a free society 
ae need much more attention and explicit care? We insist that it does.” 
Aik And again: “The threat that psychology may pass into the hands of those 
ne | not recognized by the profession is not an idle one. Nor are the psychologists 
Ries the only professional group whose status would be imperiled by their own 
feat intransigence. What physicians refuse to treat goes elsewhere—to the dis- 
aa advantage of medicine as a field of learning. The major opportunity, then, 
ee for intelligent applications of psychological knowledge and skills appears in the 
| diagnostic study, educative treatment, and vocational guidance of the pre- 
4] sumptively normal individual.” 





In Part 2 the Commission reviews the main purposes of psychology 
in a university. These are not merely to make available to beginners 
and to advanced students a body of established facts and laws—an appli- 
cable content. Most valuable is the revelation that so many human 
functions, not excluding those of emotion, motivation and ways of think- 
ing, can be approached objectively, dispassionately, scientifically. The 
value of other fields of scholarship to the student of psychology, and of 
psychology to students in other fields and in the professional schools, is 
appraised. 

After these preliminaries the Commission briskly approaches its more 
specific task of formulating recommendations regarding a plan of policy, 
organization, courses and facilities for a Department of Psychology. 

Five points of policy are called axiomatic: 
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“For the young and growing subject that is psychology, opportunity and 
freedom for change and development must be provided; working contact be- 
tween all the psychologists in the university should be created and maintained; 
the department should be large enough to contain, and by its spirit and re- 
nH 3 sources interest and reward, a wide variety of psychologists; psychology already 
y 4 occupies and bids fair in increasing measure to hold a place in a larger number 
ie of professional schools than any other academic subject; the time is at hand 
for the creation of a course to prepare professional psychologists qualified to 
practice or apply sahatecieal knowledge as psychotechnologists, clinical 
psychologists, personnel managers, or experts in vocational guidance. 

“In consequence of these major assumptions,” the Commission continues, 
“every psychologist in the university should be attached to the Department of 
Psychology. . . . The present complete independence of a psychologist in a 
ee school from the psychologists in the Department weakens the 

epartment and isolates the individual from the main-stream of the very 
subject he represents.” 


a ae 





aa 


The introductory course, which should not be required of all under- 
graduates, is described. Brief suggestions are offered with respect to the 
content of pre-professional courses in psychology for those who choose to 
prepare for careers in medicine, business, engineering, teaching, the min- 
istry, the law, and the fields of sociology and government. 
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“For which of these professions is a deliberate ignorance of the subjects 
recommended an advantage? And in which of these subjects may one assume 
that modern psychology can add nothing to the current credos and incredu- 


lities?”’ 
These courses would be taught by psychologists holding joint appoint- 
ments in their professional schools and in the Department of Psychology. 
“If administrative arrangements make this difficult they should be changed. 
. . . Strong as may be the tradition of separatism and autonomy in the various 


faculties of a university, a larger measure of interchange, collaboration, and 
mutual confidence between the faculties deserves a ten years’ trial.” 


A professional degree, the doctorate in psychology, should be given in 
the Department of Psychology to persons it has trained as practicing 
psychologists. Concurrence in this recommendation on the part of mem- 
bers of the Commission was not unanimous, although they were all in 
hearty agreement that the ideal university must offer much more thorough 
training and supervised experience than heretofore to prepare students 
to enter on the practice of a psychological profession. The report does 
not seriously discuss the possibility that training for this Psy.D. degree 
might be entrusted to a new professional school of psychology analogous 
to those which prepare for the professions of divinity, business, law, edu- 
cation or medicine, leaving the basic science to be developed and taught 
by the academic department in the faculty of arts and sciences. 

The profession of psychology is in its adolescence, a period of rapid 
growth. Reaching out gawkily in many directions, only a few of which 
are touched upon in this little volume, it is struggling toward a mature 
professional status comparable with the honored academic status it has 
maintained for sixty years. 

Since the Harvard Commission surveyed the scene, the American 
Psychological Association has undertaken its new responsibilities of ex- 
amining for certification as to professional competence those of its mem- 
bers who want to practice in one or more of several specified fields of ap- 
plication. Under these circumstances, why not postpone consideration of 
any new professional degree until sufficient experience has given a clear 
answer to the question as to whether certification cannot be so well ad- 
ministered that it accomplishes what is required in the way of definition 
and maintenance of professional standards? 

One other recommendation in the report, that concerning unity of 
organization in a single department, is provoking lively argument. The 
comprehensive department, as the Commission calls it, has long since 
demonstrated its strength and fertility in a great expanding university 
such as Ohio State. On the other hand, the fact that progress can also be 
made and standards maintained in a university where several relatively 
independent units exist, in the college of liberal arts, the school of edu- 
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cation, and such professional schools as business, engineering, law and 
medicine, is supported by a glance in the direction of Chicago, for example, 
or Michigan, or most spectacularly, Harvard itself, where the centrifugal 
force has been extreme. The shepherd of a large flock of psychologists, 
of whom the most brilliant are not unlikely to include individualists 
deeply immersed in their unique problems, has no easy task to keep them 
in one fold—in one community of scholars when they want to form at 
least seven sharply separated communities named Experimental Psychol- 
ogy, Social Psychology, Clinical Psychology, Psychobiology, Industrial 
Psychology, Educational Psychology, and Medical Psychology; while a 
few find isolated shelter and plenty of work in an institute for child devel- 
opment or a school for nurses or for psychiatric social workers, or in the 
offices of admissions, student health, the vocational and educational 
counseling service, or a bureau dedicated to reduction of highway traffic 
accidents. 

Intermediate between the extremes typified by Harvard and Ohio 
State are universities such as Minnesota where team work among all the 
psychologists whatever their duties has been effectively encouraged partly 
through a relatively simple organizational structure but chiefly by long- 
continued planned cultivation of the purpose to collaborate. 

The suggestion has been offered that a sharing of mutual helpfulness 
among the psychologists of a great university would be facilitated by 
bringing them together under one roof. A generous alumnus might 
provide one vast building to house the entire group; and this should help 
even though certain psychologists would yearn for closer intimacy with 
colleagues in other departments: government, labor relations, neural anat- 
omy, machine design, biochemistry, sociology, or the Dean’s office; and 
a few, preferring insulation from their own kind, would seek hide-aways 
far from this central building. 

After all, community among scholars is fostered principally by the 
will of each to share actively in the thinking and interests of his fellows. 
In an ideal university, this purpose is widespread and rooted deep. 

There are times when the drive toward organizational growth by 
fission becomes powerful, as it did in the American Psychological Associa- 
tion during the ’30s when hundreds of practicing psychologists, fed up 
with years of relatively sterile programs and the narrowly academic 
preoccupations of their venerable society, organized the American As- 
sociation for Applied Psychology which promptly burgeoned into a vigor- 
ous, aggressive enterprise. The parent group then awoke to the fact that 
the majority of its members were practitioners, not teachers, of psy- 
chology. Facing this reality, its policies, structure and programs were 
drastically revamped, and the secessionists willingly came home. 

So it may be, some day, in the Ideal University. 


Correction 
To the Editor: 


In the article ‘Communication Between Management and Workers” 
by Paterson and Jenkins which appeared in the February, 1948, issue of 
Journal of Applied Psychology, is an erroneous statement on page 72 
which I should like to correct. The statement is as follows: “Presumably, 
information about scientific management procedures and techniques is 
transmitted from management to workers by verbal means only.” 

This is an unwarranted assumption and contrary to the facts. Refer- 
ences to the following pages (and others), in my book “Industrial Man- 
agement in Transition” will serve as evidence on this point: Pages 63, 
70, 162, 248, 253, 269. For example, on page 70 where “mechanisms” 
are discussed, one of the mechanisms listed is “‘instruction cards for the 
workmen.” These, of course, are written. In Taylor’s work, the im- 


portance of “standard written instructions” is emphasized, although no 
emphasis was placed upon typography. The charts of Henry L. Gantt 
were also means of communication as were the Gilbreth, and more recently, 


Mogenson’s micromotion studies. 
Signed: George Filipetti 


Professor of Economics and Business Administration, 
University of Minnesota 


Reply 


Dr. Filipetti is technically correct. Incidental mention is made of 
written instructions and communications on the pages cited, but nowhere 
is the problem of scientifically constructed written communications to 
employees mentioned. The index of the book seems to be deficient. 
Nowhere can we find in the index such items as instruction cards, com- 
munications, written communications, charts, means of communication, 
understanding of written instructions, or any other term suggesting that 
scientific management people seriously considered, at any time, scientific 
means of insuring comprehension of management communications on the 
part of workers. 

Signed: Donald G. Paterson and James J. Jenkins 

Department of Psychology, 

University of Minnesota 
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Zeisel, Hans. Say it with figures. New York: Harper and Brothers, 
1947. Pp. 250. $3.00. 


This book is written by Hans Zeisel, who was in Vienna with Lazars- 
feld, and who has worked with him in the Columbia Bureau of Social 
Research, and is now Manager of Research Development for McCann- 
Erickson. It deals with methods for handling questionnaire data of the 
sort obtained in attitude, opinion, and consumer research. The book is 
unique, has an excellent selection of topics, and overlaps zero with the 
usual statistics book. In fact, it isn’t statistics at all, but rather is com- 
mon sense and a little arithmetic. Zeisel demonstrates his points with 
abundant research findings from the files of the Bureau of Social Research, 
from advertising, and from public opinion results. It is a most welcome 
and needed addition to the literature. 


Kenneth E. Clark 
The University of Minnesota 


Gordon, H. Phoebe, Densford, Katherine J. and Williamson, Edmund C. 
Counseling in schools of Nursing. New York: McGraw-Hill, 1947. pp. 
xiii + 279. $3.00. 


This book has been prepared for administrators, teachers, supervisors, 
head nurses and all other individuals who, through their contacts with 
students in schools of nursing, contribute to the success of the adjust- 
ments which students make. The authors hope the book will also be of 
value to hospital administrators in helping them to develop greater un- 
derstanding of the problems of student nurses. 

The book is divided into four parts. Part One, The Professional Back- 
ground of the Student in Nursing, includes a brief presentation of the 
historical development of nursing as a service and as a profession. The 
authors discuss the problems which students face as they attempt to 
function as learners who are being educated to render professional service 
in both its curative and preventive aspects, and as workers in the hos- 
pital who ec: ry responsibility for the care of sick patients. The situation 
is further complicated by the fact that students have experiences in many 
types of organizations including hospitals, general and special; public 
health agencies; and out-patient departments. The curricular experi- 
ences of students include organized class-room instruction, and a clinical 
curriculum which is carried forth in a milieu of myriad relationships with 
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patients and many other individuals. Because of the complexity of 
present day nursing education, and the increased demands which are 
placed upon nurses, the importance of an adequate personnel program 
is stressed. 

In Part Two, Understanding the Student in Nursing, the psycholo- 
gical characteristics and the social background of student nurses are 
discussed. In Part Three, Counseling and Personnel Services in Schools 
of Nursing, seven chapters are devoted to discussions of the nature of 
student counseling and personnel services, measures used in selecting and 
counseling students, student orientation, student counseling service, dis- 
ciplinary counseling, student health programs, and extra-curricular acti- 
vities. Part Four, Developing the Personnel Program in a School of 
Nursing, presents problems which are involved in the organization of the 
program, and gives suggestions for both organization and continuing de- 
velopment of the program. 

This book, which comes at a time when nursing leaders and hospital 
administrators are seeking answers to pressing personnel problems, is a 
valuable contribution to nursing literature. The book is so well organized 
and the material so clearly presented that it should be of definite value to 
individuals who do not have a broad background in personnel work. 
Though no attempt-is made to discuss the underlying philosophy of 
personnel work as such, the interrelationships between the needs of the in- 
dividual and the needs of society are emphasized in every section of the 
book. The chapters on student counseling and, student discipline seem 
of particular value. Though the employment of a qualified personnel 
director is considered an essential step in the development of the program, 
the counseling functions of all individuals who come in contact with 
students are repeatedly emphasized. 

This book, in keeping with its underlying philosophy, is not in the least 
authoritative. The individual who looks for definite answers to prob- 
lems of a particular situation will not find them. However, the person 
who wishes to develop an understanding of the basic principles upon which 
the entire counseling program is founded, and to utilize those principles 
in developing a program which meets the needs of a particular situation 
will find this book of great value. As a basic text in courses in personnel 
work for graduate nurses, supplemented with suggested readings which 
are given at the close of each chapter, this book will also fill a definite need. 


Helen Nahm 
Division of Nursing Education, 
Duke University, Durham, N. C. 
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fi | | Bonnardell, R. L’adaptation de ’ homme a son métier. (2nd ed.) Paris: 
i | Presses Universitaires de France, 1946. Pp.199. 120 French francs. 
Hel The title and, even more so, the sub-title (A study in social and in- 
. | dustrial psychology) indicate a broad scope. Actually, the content is 
: I yi limited to an exposition of psychometrics, with special reference to voca- 


. tional guidance and industrial selection. The author, who started his 
Ne research career by physicochemical studies on excitability of muscles and 
a isolated nerves, utilizes to advantage both his knowledge of the theory of 
quantitative psychology and his extensive experience as an industrial 
psychologist for the Peugeot Automobile Company. After a survey of 
pseudo-scientific (physiognomy, graphology) and traditional techniques 
of evaluating the aptitudes of applicants for employment (application 
blank, recommendations, interview, job trial), criticized on account of 
their subjectivity, the author presents the history and principles of 
ibll psychometrics. Miniature job situations (“synthetic tests’) are men- 
| tioned but the emphasis is put on the analytical (componental) approach 
and the statistical treatment, applied to both the test scores and the 
criteria of job proficiency, with preference for Thurstone’s system of 
factor analysis. Steps to be taken in establishing psychometric services 
in a plant are described. 
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Laboratory of Physiological Hygiene, 
University of Minnesota 


Ryan, Thomas A. Work and effort: The psychology of production. New 
York: The Ronald Press Company, 1947. Pp. xii + 323. $4.50. 


A few universities already have in their curricula a course entitled 
"| “Experimental Industrial Psychology.”” This book would make an ex- 
cellent textbook for such a course and may stimulate the setting up of 
more courses of this nature. For Ryan does not make ex cathedra pro- 
nouncements on the topics he is surveying but shows the reader how the 
conclusions were arrived at by giving him a summary of the research 
methods and results upon which the conclusions were based. The ap- 
proach leads naturally to a consideration of the validity of the findings 
and keeps the interpretations close to the facts. It serves to warn the 
applied psychologist of the danger of making recommendations beyond 
the data at hand and to stimulate industry to support basic psychological 
research with the same understanding with which they have underwritten 
engineering research on their technical problems. 

The content of the book is organized around the two basic problems 
of efficiency and motivation in work activities. “Efficiency” is defined as 
the ratio between input and output, both of the variables being considered 
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in a broad sense; output includes worker satisfaction as well as rate of per- 
formance and input includes all adverse effects of the~work—energy ex- 
penditure, effort, fatigue, effects upon health, personal adjustment in 
society, the worker’s time, etc. In contrast to efficiency, in which the 
problem is to increase the output obtained from a given level of effort, 
“motivation” is concerned with raising the level of effort and thus in- 
creasing the worker’s productivity. 

So far'as basic methodology is concerned, the chief problem is measur- 
ing the input variable and Ryan devotes three chapters to the various 
approaches to measurement of the cost of work. Work activities are 
classified into two major categories: muscular work and sedentary work. 
Measures of cost in terms of energy expenditure, reduced capacity, meta- 
bolic changes, fatigue tests, long-term production trends, etc. are evalu- 
ated and the specific limitations of each are pointed out. A final sum- 
mary groups the various measures into three classes: (1) promising indices 
still in the developmental stage, (2) a few established measures which have 
limited application, (3) crude indices of limited validity, but which pro- 
vide rough solutions until more refined methods become practically useful. 

In contrast to the extended treatment of the basic problem of measure- 
ment of the cost of work to the worker, the surveys of factors affecting 
efficiency are relatively brief. In line with the author’s high standards 
of research methodology, selected studies are described and cautious con- 
clusions drawn with respect to the influence upon efficiency of noise, tem- 
perature, ventilation, illumination, hours of work, rest periods, and sleep. 
A brief chapter on work methods presents a critique of motion study 
as viewed by a psychologist. 

The topics described above account for about half the book and rep- 
resent a rather unified survey of the problem of efficiency. The second 
half of the book presents separate discussions of incentives, boredom, rate 
setting, merit rating and job evaluation, accident control, and industrial 
training. In each case, however, the experimental viewpoint is main- 
tained and the relevance of the findings to efficiency and motivation is 
pointed out. Of particular interest are the evaluation of time study from 
a psychological viewpoint and the rather devastating attack on rating 
methods in common use. The most extensive treatment is of accident 
control, presumably because of the greater amount of experimental data 
in that area. 

Considering the magnitude and difficulty of the task which the author 
set for himself, the outcome is to be admired. The nature of the approach 
necessitated a detailed treatment of selected research rather than a sum- 
mary of all available data. Probably no two authors would agree on 
which studies should be included in such a treatment but this reviewer 
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found discussed most of the basic studies with which he is familiar. The 
most notable omissions were the RCA studies on music in industry and the 
Dartmouth studies on visual fatigue. 

Like other writers, Ryan attempts to clarify some of the basic concepts 
by definitions of terminology such as efficiency, effort, energy expenditure, 
etc., and to organize the concepts into a logical framework. This is not 
the place to evaluate the validity or usefulness of his particular set of 
constructs. If it stimulates further research in this area it will have 
served its major purpose. 


Albert 8. Thompson 
Vanderbilt University 


Franziska Bumgarten, Die Psychologie der Menschenbehandlung im Be- 
triebe, 2nd ed., Zurich, Rascherverlag, 1946, pp. 304. 


The book discusses the psychology of inter-personal relationships be- 
tween the executive and the subordinate in industry in a mainly non- 
quantitative and practical manner. Such relationships are now generally 
recognized as a factor of paramount importance in determining the prog- 
ress of industrial enterprise. 

Employers are frequently interested in very general suggestions which 
can be used with all employees in order to produce an increase in efficiency 
or an harmonic atmosphere in industry. However, the author stresses 
that the control of personal relationships in industry can only be effective 
if based on an understanding of the psychology of the employee. Each 
worker should be treated as an individual with all his assets, weaknesses 
and idiosyncrasies. Only after having studied and understood the per- 
sonality of his subordinate can a supervisor, foreman, etc. choose the most 
appropriate approach. 

The author presents an extensive study of different types of employers 
and employees, describing their approach to their work, to their superiors, 
subordinates, and co-workers. She describes in detail and explains 
thoroughly their respective reactions at the time which she calls “the 
critical moment”’ of their relationships, when an order is given and exe- 
cuted, when control is exercised or submitted to, when a reproof is ex- 
pressed or received, when a punishment is given or taken. 

The book should be useful to managers, supervisors, foremen, etc. in 
helping them understand the group of people by whom they are sur- 
rounded in their work. Many of the suggestions given in great quantity 
could probably prove very effective in dealing with human material if 
intelligently applied by men in executive positions. The author is 
strongly aware of the fact that intelligent methods in industrial training 
and guidance cannot always be produced by suggestions, even if they are 
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carefully followed, but that the use of such practices is strongly dependent 
on the executive’s own emotional make-up. 

We can congratulate ourselves that in this country relations between 
the executive and the subordinate appear to represent a real partnership 
more than in Switzerland, where the book was published; that the power 
of the executive is counterbalanced by a sense of self-respect and dignity 
of the American worker. 

The book is written in a fluent and precise style. The author follows 
her line of thought throughout the book with impeccable logic. 

Michael Joelson 


David Webb Company, 
Edinburg, Indiana 
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i Books, monographs, and pamphlets for listing and possible review should be sent to 
: i Donald G. Paterson, Editor, Department of Psychology, University 
to of Minnesota, Minneapolis 14, Minnesota 


Office library of an industrial relations executive. Helen Baker. Prince- 
: i ton: Industrial Relations Section, Princeton University, 1946. Pp. 
4 35. $.50. 
American junior colleges. Second Edition. Jesse P. Bogue, Editor. 
Washington, D. C.: American Council on Education, 1948. Pp. 500. 


te 

yet $6.50. 

i | American universities and colleges. Fifth Edition. A. J. Brumbaugh, 
- Editor. Washington, D. C.: American Council on Education, 1948. 
Bi: i: Pp. 1100. $8.00. 


Case histories in clinical and abnormal psychology. Arthur Burton and 
Robert E. Harris, Editors. New York: Harper and Brothers, 1947. 

ii Pp. 680. $4.00. 

Reading and visual fatigue. Leonard Carmichael and Walter F. Dear- 
born. Boston: Houghton Mifflin Co., 1947. Pp. 483. $5.00. 

i i Hearing and deafness. Hallowell Davis, Editor. New York: Murray 

f i Hill Books, Inc., 1947. Pp. 496. $5.00. 

Handbook of job facts. Alice H. Frankel. Chicago: Science Research 

| Associates, 1948. Pp. 160. $2.00. 

te Guidance testing. Clifford P. Froehlich and Arthur L. Benson. Chicago: 

a Science Research Associates, 1948. Pp. 104. $1.00. 
Fi Educational measurement and evaluation. N. L. Gage and H. H. Rem- 
mers. New York: Harper and Brothers, 1943. Pp. 580. $3.60. 
Personnel and industrial psychology. Edwin E. Ghiselli and Clarence W. 
Brown. New York: McGraw-Hill Book Co., Inc., 1948. Pp. 475. 
$4.50. 

A trade union analysis of time study. William Gomberg. Chicago: 
Science Research Associates, 1948. Pp. 256. $4.25. 

He The contemporary American family. Ernest R. Groves and Gladys H. 

: Hi Groves. Philadelphia: J. B. Lippincott Co., 1947. Pp. 838. $4.50. 
) Theories of learning. Ernest Hilgard. New York: D. Appieton-Century 
Co., Inc., 1948. Pp. 409. $3.75. 

ie Encyclopedia of vocational guidance. VolumelIandII. Oscar J. Kaplan, 

hd Editor. New York: Philosophical Library, 1948. Pp. 1422. $18.50. 
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Psychological atlas. David Katz. New York: Philosophical Library, 
1948. Pp. 142. $5.00. 

Psychological warfare. Paul M. A. Linebarger. Washington, D. C.: 
Infantry Journal, 1948. Pp. 259. $3.50. 

The psychology of abnormal people. John J. B. Morgan and George D. 
Lovell. New York: Longmans, Green and Co., Inc., 1948. Pp. 673. 
$4.50... 

A guide to confident living. Norman V. Peale. New York: Prentice-Hall, 
Inc., 1948. Pp. 248. $2.75. 

International directory of opinion and attitude research. Laszlo Radvanyi, 
Editor. Mexico: National University of Mexico, 1948. $6.00 paper, 
$7.00 cloth. 

Mental health in modern society. Thomas A. C. Rennie and Luther E. 
Woodward. New York: The Commonwealth Fund, 1948. Pp. 424. 
$4.00. 

Logic and scientific methods. Herbert L. Searles. New York: The 
Ronald Press Co., 1948. Pp. 326. $3.50. 

Vocational counseling and placement in the community in relation to labor 
mobility, tenure, and other factors. Pamphlet 5. Carroll L. Shartle. 
New York: Social Science Research Council, 1948. Pp. 18. $.25. 

Psychiatry for the pediatrician. Hale F. Shirley. New York: The Com- 
monwealth Fund, 1948. Pp. 442. $4.50. 

How to develop your executive ability. Daniel Starch. New York: Harper 
and Brothers, 1946. Pp. 267. $3.00. 

Industry and society. William F. Whyte, Editor; New York: McGraw- 
Hill Book Co., Inc., 1946. Pp. 211. $2.50. 

The psychology of teaching. Asahel D. Woodruff. New York: Longmans, 
Green and Co., Inc., 1948. Pp. 273. $3.00. 

Child care, questions and answers. Children’s Welfare Federation of New 
York City. New York: Doubleday and Co., 1948. Pp. 159. $2.00. 

Labor force definition and measurement. New York: Social Science Re- 
search Council, 1947. Pp. 134. $1.00. 

The nation’s most prosperous industry. New York: Textile Workers 
Union of America, CIO, 1948. Pp. 24. 

Opportunities for psychologists, psychiatrists, psychiatric social workers. 
Pasadena: Western Personnel Institute, 1948. Pp. 38. $1.00. 

Annual report of the Federal Security Agency. Washington, D. C.: Super- 
intendent of Documents, U.S. Government Printing Office, 1947. 
Pp. 41. $.15. 
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